flanksource / canary-checker

Kubernetes Native Health Check Platform
https://canarychecker.io
Apache License 2.0
196 stars 36 forks source link

Canary status does not have display/error messages #2126

Open dmorrowjc opened 2 months ago

dmorrowjc commented 2 months ago

I'm able to see the display message for my checks in the UI, but not using the kubernetes CLI. Is that expected? I see an errorMessage and message field in the status of my Canary, but cannot get anything to display. Neither field appears to be populated regardless of if I've setup a custom display message.

For example, I can use k get canary kube-system-checks -o json and see:

{
    "apiVersion": "canaries.flanksource.com/v1",
    "kind": "Canary",
    "metadata": {
        "generation": 3,
        "name": "kube-system-checks",
        "namespace": "canary-checker",
    },
    "spec": {
        "interval": 30,
        "kubernetes": [
            {
                "display": {
                    "expr": "dyn(results).map(x, k8s.getHealth(x))\n"
                },
                "kind": "Pod",
                "name": "kube-system",
                "namespace": "kube-system",
                "namespaceSelector": {
                    "name": "kube-system"
                },
                "resource": {
                    "labelSelector": "k8s-app=kube-dns"
                }
            }
        ],
        "replicas": 1
    },
    "status": {
        "checkStatus": {
            "0191c1db-a058-2574-801a-4cd6502a274d": {
                "uptime1h": "14/14 (100.0%)"
            }
        },
        "checks": {
            "kube-system": "0191c1db-a058-2574-801a-4cd6502a274d"
        },
        "errorMessage": "",
        "lastCheck": "2024-09-06T20:55:20Z",
        "latency1h": "23ms",
        "message": "",
        "observedGeneration": 3,
        "replicas": 1,
        "status": "Passed",
        "uptime1h": "14/14 (100.0%)"
    }
}

In that example I have a display message defined, but I never see anything in status.message or status.errorMessage. It would be helpful to see some information there so that when a check succeeds I can tell which resources it found and when it fails I can see which resources weren't healthy.

This seems to be true of many check types, not just kubernetes checks.

flankbot commented 2 months ago

@dmorrowjc Can you describe the canary? There should be passed/failed events that fire with more details

dmorrowjc commented 2 months ago
$ k describe canary -n canary-checker kube-system-checks 
Name:         kube-system-checks
Namespace:    canary-checker
Labels:       <none>
Annotations:  <none>
API Version:  canaries.flanksource.com/v1
Kind:         Canary
Metadata:
  Creation Timestamp:  2024-08-26T02:06:09Z
  Finalizers:
    canary.canaries.flanksource.com
  Generation:        3
  Resource Version:  259547041
  UID:               323a1a21-5d8b-4145-b896-6054c0ba518c
Spec:
  Interval:  30
  Kubernetes:
    Display:
      Expr:  dyn(results).map(x, k8s.getHealth(x))

    Kind:       Pod
    Name:       kube-system
    Namespace:  kube-system
    Namespace Selector:
      Name:  kube-system
    Resource:
      Label Selector:  k8s-app=kube-dns
  Replicas:            1
Status:
  Check Status:
    0191e1f9-4961-0bfa-6214-51cd46c5c693:
      uptime1h:  11/11 (100.0%)
  Checks:
    Kube - System:      0191e1f9-4961-0bfa-6214-51cd46c5c693
  Error Message:        
  Last Check:           2024-09-11T17:35:23Z
  latency1h:            29ms
  Message:              
  Observed Generation:  3
  Replicas:             1
  Status:               Passed
  uptime1h:             11/11 (100.0%)
Events:                 <none>

It looks like the describe command isn't giving me any messages either. Is this a bug? I would expect to be able to see whatever we've defined in the display.expr to show up somewhere regardless of if the check is passing or failing. When we're troubleshooting failing checks or developing new checks it is helpful to be able to output information for debugging purposes.