Issue with Alerts and one question.

carlosedp / cluster-monitoring

Cluster monitoring stack for clusters based on Prometheus Operator

MIT License

740 stars 200 forks source link

Issue with Alerts and one question. #56

Closed dennym closed 4 years ago

dennym commented 4 years ago

Hey,

first off all thanks for this nice work. Running it on my RPi4 cluster with HypriotOS and K3s without issues and also worked first try. Unfortunately the Alerts show me KubeControllerManagerDown and KubeSchedulerDown is this expected behavior?

Additionally I have a question about your blogpost and one particular screenshot: https://miro.medium.com/max/1400/1*zp4bS5omhxoLxbC4xGh5vQ.png There it shows all the processes and their percentage of CPU usage. For me it shows only 1 graph with Value | 21% | 14%. Is this a limitation due to HypriotOS, the ARM, k3s or did I forget something?

carlosedp commented 4 years ago

You need to set the k3s to enabled on vars.jsonnet so the scheduler and controllerManager are monitored. Good catch on the dashboard, I've updated it in 9b17541.

dennym commented 4 years ago

@carlosedp I redeployed latest today to my cluster and double checked that the k3s settings are set. Unfortunately after 45 minutes the alerts appeared again and are staying active. Any guidance where to check where to start to debug that issue?

carlosedp commented 4 years ago

Check if reapplying the manifests below fixes:

kubectl apply -f manifests/prometheus-kubeSchedulerPrometheusDiscoveryEndpoints.yaml
kubectl apply -f manifests/prometheus-kubeControllerManagerPrometheusDiscoveryEndpoints.yaml

dennym commented 4 years ago

Mhm weird I went through the log of kubectl apply -f manifests/ and make deploy and both appear in the list. But don't seem to be applied correctly. Anyways manually applying them seems to resolve the issue. Thanks :)

Is this something worth to investigate further or just leave it?

carlosedp commented 4 years ago

I saw something similar happen to me. Gonna take a look further.

dennym commented 4 years ago

Just reporting in: It appeared again and my little observation is that when the cluster was down (currently I shut it down over night) it seems to appear after it booted up again.

carlosedp commented 4 years ago

This is because K3s doesn't create endpoints for these two services by default so everytime the cluster is restarted, they need to be created (apply those manifests).