Open dharapvj opened 12 months ago
I think the most crucial difference between our charts and kube-prometheus-stack is that the prometheus-operator uses ServiceMonitor objects and it does NOT make use of Service discovery annotations This makes the migration harder.. So unless we somehow explicitly deprecate existing way... its going to be really hard to sell prometheus-operator (and thereby kube-prometheus-stack helm chart)
I feel that the most sensible thing that we can do instead would be to migrate over to regular charts - which does not use prometheus-operator - from prometheus-community helm repo and reduce our custom chart maintenance
But what we will miss with the other charts is that we will not get their new and more beautiful looking charts.
by default the kube-prometheus-stack has lot of issues in scraping kube-proxy, kube-controller-manager, kube-scheduler and etcds.
This thread on kubeone repo has some solutions and another related comment
we should try these solutions and also see if currently KKP is scraping these things or not.
Update: Slightly modified version of this HAProxy solution (to just update the ports) works well to address kube-scheduler, controller-manager, kube-proxy AND etcd pods.
Note: kps = kube-prometheus-stack; db = dashboard;
Feature | KKP Implementation | kps implementation | Can it be added in kps? |
---|---|---|---|
Loki Integration | Integrated OOTB | Not integrated OOTB | Y |
Go Dashboards | There are two dashboard for Go Applications | None | Y |
KKP Dashboards | There are six dashboard for KKP specific views | None | Y |
k8s Cluster Dashboards | 1 | 2 Its almost similar and contains more info | NA. kps is better |
k8s Namespace Dashboards | 1 | 2 pod db is almost similar and contains more info. Workload db gives same info aggregated at STS / Deploy level. | NA. kps is better |
k8s Pod Dashboards | 2 almost overlapping db | 1. pod db is almost similar to KKP db and contains more info. . | NA. kps is better |
k8s kubelet Dashboards | 2 | 1 | Y. There is relevant information in both KKP and kps dashboards. We should consider merging them. |
k8s Networking Dashboards | 0 | 5. Networking related information at differing granularity - clustr, ns, pod, etc | NA. kps is better |
k8s Node Dashboards | 2 Slightly overlapping Node and Node Resource Usage db | 3. The USE Method node is identical with KKP. Rest are different visualizations | Y. We should consider merging KKP and kgs db |
k8s Resource usage Dashboards | 1 Shows no data? | 0 | Y |
k8s STS Dashboards | 1 | 0 | Maybe. I guess, the NS workload db for kps already covers this |
k8s Volume Dashboards | 1 | 1 | Y find KKP dashboard better. It allows multi select and global aggregation (though not sure how useful that is) |
k8s etcd Dashboards | 3 | 1 | Maybe. KKP main dashboard does not work presumably due to etcd restrictions which I have worked around via ha-proxy solution in kcp. There is one extra dashboard which shows count of etcd objects. Not sure how useful that is. |
Minio Dashboards | 1 | 0 | We should pick the latest minio dashboard from Grafana. Its way more feature packed. |
Prometheus Dashboards | 2 | 1 | Y. I think we should merge the KKP dashboard in kcp dashboard. It has good relevant things to look at. Also blackbox exporter from KKP should be brought into kcp |
MLA Dashboards | 1 | 2 | Not sure what this dashboard shows in KKP. Currently, it shows nothing for me. kcp has grafana and alertmanager dashboards which should be retained. But grafana dashboard does not show any data currently (misconfigured?) |
nginx ingress Dashboards | 1 | 0 | We should pick the latest nginx ingress dashboard from Grafana. or similar |
Note: kps = kube-prometheus-stack; SM=ServiceMonitor
Goal is to have every single target already scraped by KKP to be also scraped by kps
Target | KKP | kps | If not in kps, mitigation plan | Status |
---|---|---|---|---|
k8s apiserver | Y | Y | NA | |
cadvisor | Y | Y | NA | In KKP we have job=cadvisor in prometheus/config/scraping folder. In KPS, we have job=kubelet instead. It is auto-configured. |
kubelet | Y | Y | NA | |
kubelet probes | N | Y | NA | |
kube-state-metrics | Y | Y | NA | |
kube-controller-manager | N | Y | NA | |
kube-etcd | N | Y | NA | |
kube-proxy | N | Y | NA | |
kube-scheduler | N | Y | NA | |
kube-prometheus-stack | N | Y | NA | |
node-exporter | Y | Y | NA | |
prometheus | Y | Y | NA | |
pods - CoreDNS | N | Y | NA | |
pods - osm | Y | N | Write new SM | |
pods - node-local-dns | Y | N | Write new SM / Enable the SM in helm chart | |
pods - minio | Y | N | Write new SM / Enable the SM in helm chart | |
pods - dex | Y | N | Write new SM / Enable the SM in helm chart | |
pods - promtail | Y | N | Write new SM / Enable the SM in helm chart | |
pods - kkp seed-ctrl-mgr | Y | N | Write new SM | New PM |
pods - nginx-ingress | Y | N | Write new SM / Enable the SM in helm chart | New SM |
pods - nodeport-proxy-envoy | Y | N | Write new SM | New SM |
pods - cluster-autoscaler | Y | N | Write new SM / Enable the SM in helm chart | New PM |
pods - velero | Y | N | Write new SM / Enable the SM in helm chart | |
pods - Loki | Y | N | Write new SM | |
pods - kubermatic-api | Y | N | Write new SM | |
pods - blackbox-exporter | Y | N | Potentially, I have not turned this on. | |
pods - machine-controller | Y | N | Write new SM | |
pods - kubermatic-webhook | Y | N | Write new SM | |
pods - kubermatic-operator | Y | N | Write new SM | |
pods - kubermatic-master-ctrl-mgr | Y | N | Write new SM | |
pods - cert-manager | Y | N | Write new SM / Enable the SM in helm chart | New SM |
pods - cluster-XXX namespace | Y | N | Write new SM | New SM (federates cluster-xxx prometheus). In future, we should be moving to prometheus agent with remote_write and remove the need of service monitor |
Note: kps = kube-prometheus-stack;
Goal is to have every single alert rule present in KKP to be available in kps based setup. I think we will need to use jsonnet to add new rules.
General observation: All the rules in kubermatic has a label called service
which kind of identifies the source of the alertrule. I think it is a good addition. We should have same in kps rules as well.
Rule category | KKP | kps |
---|---|---|
Blackbox-exporter | Y | N. Need to implement this additionally. |
cert-manager | Y | N |
helm-exporter | Y | N |
kube-apiserver | Y | Y. kube-apiserver-availability, burn-rate, histogram, slo, We should reconcile. |
kubelet | Y | Y. We should reconcile. Also kubernetes-storage rules. |
kube-state-metric | Y | Y. We should reconcile. Also, kubernetes-apps, kubernetes-resources rules. |
node-exporter | Y | N |
prometheus | Y | N |
velero | Y | N |
VPA | Y | N |
kubermatic | Y | N |
kubermatic-seed | Y | N |
kube-controller-manager | Y | N |
kube-scheduler | Y | N |
Alertmanager | N | Y. Few alerts are covered in KKP under prometheus set |
config-reloaders | N | Y |
etcd | N? | Y |
general | N? | Y |
TO BE CONTINUED
Custom dashboards can be added in any NS via a configmap with label grafana_dashboard: "1"
. Reference
Similarly, custom Alert Rules can be added by creating PrometheusRule CR. It is also possible to add them in values.yaml itself
Custom targets can be achieved via writing service monitors.
e.g. ServiceMonitor to query kubermatic-api server:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
release: kube-prometheus-stack
name: kubermatic-apiserver
namespace: monitoring
spec:
selector:
matchLabels:
# This label MUST match the label ON THE SVC and not the matchLabel in SVC.
app.kubernetes.io/name: kubermatic-api
namespaceSelector:
matchNames:
- kubermatic
endpoints:
#unfortunately, named port metric does not work .. mostly due to special config of kubermatic-api server.
- targetPort: 8085
Remember to have labels on the services. e.g. kubermatic-api service does not have specific label on it (apart from managed-by kubuermatic-operator) so then ServiceMonitor will not match the service.
Another thing is we probably need to use podMonitord / relabelconfigs to bring some more extra labels.
In KKP
promQL - http_request_duration_seconds_count{app_kubernetes_io_name="kubermatic-api"}
brings below series.
But in KPS
promQL http_request_duration_seconds_count{job="kubermatic-api"}
brings
So - as you can see, the labels are different so we need some relabel configs in ServiceMonitor endpoint
block to match kkp, if needed.
kps Thanos story is similar to KKP. It allows you to deploy sidecar but does not deploy actual Thanos. We should use Banzaicloud thanos. Some referral help Also see this on configuration of Thanos. This is already covered in blog above as well. Another blog. We should also make use of Thanos community component (still to be contributed).
# create helm output based on heavily configured alertmanager
❯ helm template alertmanager -f ./dev/vj1-master/values.yaml /kubermatic/releases/v2.23.6/charts/monitoring/alertmanager > alertmanager-kkp-render.yaml
# create helm output based on kps with current values.yaml
❯ helm template -n monitoring1 --create-namespace kube-prometheus-stack prometheus-community/kube-prometheus-stack -f values-kube-prometheus-stack.yaml -f values-kube-prometheus-stack-slack-config.yaml > kps-render.yaml
Compare both outputs to know differences in Alertmanager. Then reconcile the differences. Repeat for prometheus and grafana.
It is important that we remember to add proper resource
and service
labels to alerts in target system. This is crucial for handling the alerts via Alerta since Alerta needs resource
label to group the alerts.
To bring changes similar to PR https://github.com/kubermatic/kubermatic/pull/7775, we will need to modify existing alerts from kube-prometheus-stack. But, it is not straight forward to override existing alerts from kube-prometheus-stack.
Instead, we can rewrite alert labels to bring desired impact so that kube-prometheus stack alerts plays nice with Alerta (the proper resource gets displayed in resource column alerta instead of kube-state-metric pod's ip.
e.g.
prometheus:
prometheusSpec:
# Massage (Relabel) some of the alerts to provide meaningful information
additionalAlertRelabelConfigs:
- source_labels: [alertname,namespace, horizontalpodautoscaler]
regex: "KubernetesHpaMetricAvailability;(.*);(.*)"
target_label: instance
replacement: "$1/$2"
- source_labels: [alertname,namespace,pod,container]
regex: "KubernetesContainerOomKiller;(.*);(.*);(.*)"
target_label: instance
replacement: "$1/$2/$3"
Alerta heartbeat plugin can directly use Watchdog
alert from kube-prometheus stack via customization env var:
HEARTBEAT_EVENTS: "['Heartbeat', 'Watchdog']"
Code is now available in https://github.com/dharapvj/tmp-kube-prometheus-stack (currently private)
Another thing is we probably need to use podMonitord / relabelconfigs to bring some more extra labels.
target | Comment |
---|---|
cert-manager | a. Probably, the service monitor should be create via cert-manager helm chart. b. We see variations in the labels on the service metrics (via kps) and pod metrics (existing kkp). But the extra labels in existing kkp are not really getting used anywhere. There is no cert-manager dashboard and the cert-manager alert rules do not make filtering using the labels. So we can just keep the smaller set of labels generated in KPS |
Ideally, we should let upstream helm chart take care of creating ServiceMonitor objects and thereby free us from maintenance of those objects.
e.g. cert-manager, minio etc all helm-charts allow us to control creation of ServiceMonitor objects via some values.yaml
configuration like prometheus.serviceMonitor.enabled
. This would work perfectly fine as long as kps CRDs are installed in the cluster before these helm charts are installed. But e.g. in KKP, we install cert-manager first before installing KPS stack. So unless we somehow get KPS CRDs installed BEFORE running kubermatic-installer, the ServiceMonitor objects will get created without any issues.
Since this precondition of CRD installation is not guaranteed, I think, KKP will need to take responsibility of managing the ServiceMonitor yaml files and changes that might be needed in future.
Both these approaches above have some issues. So we should discuss on which approach is of lesser issues.
Both these approaches above have some issues. So we should discuss on which approach is of lesser issues.
We should see how we can get KPS CRDs available before kubermatic-installer installs nginx-ingress-controller OR certmanager helmchart so that we can use upstream service monitors.
Result of discussion with Wojciech: The relabeling logic from existing KKP prometheus targets should not be copied without qualifying their need. We should carry forward only the needed relabeling rules.
For seed monitoring