Implement Kube-prometheus-stack while maintaining feature parity with Kubermatic existing monitoring stack

dharapvj commented 12 months ago

For seed monitoring

[x] We need IAP based auth to alertmanager, prometheus and grafana
[x] Custom alert rules as implemented in Kubermatic
[x] Custom dashboards as implemented in KKP
[x] Ensure all targets from Kubermatic are implemeted via ServiceMonitor Objects
[x] oauth based sign-in into grafana
[x] Create a parent chart(s) so that we can prefix the configuration appropriately and bring it under common values.yaml used in Kubermatic configuration.
[ ] Prepare a documentation on the migration path and also estimate on how much change end user will need.
[ ] Prometheus agent in cluster-xxx namespace ?
[ ] kube-prometheus-stack does not bring Loki, promtail / grafana-agent and blackbox exporter. We need to add them in kps.
[ ] KPS includes thanos sidecar. Rest of the Thanos needs to be re-implemented. Take a look at comment below
[x] grafana, prometheus, alertmanager persistence

dharapvj commented 12 months ago

I think the most crucial difference between our charts and kube-prometheus-stack is that the prometheus-operator uses ServiceMonitor objects and it does NOT make use of Service discovery annotations This makes the migration harder.. So unless we somehow explicitly deprecate existing way... its going to be really hard to sell prometheus-operator (and thereby kube-prometheus-stack helm chart)

I feel that the most sensible thing that we can do instead would be to migrate over to regular charts - which does not use prometheus-operator - from prometheus-community helm repo and reduce our custom chart maintenance

dharapvj commented 12 months ago

But what we will miss with the other charts is that we will not get their new and more beautiful looking charts.

dharapvj commented 12 months ago

by default the kube-prometheus-stack has lot of issues in scraping kube-proxy, kube-controller-manager, kube-scheduler and etcds.

This thread on kubeone repo has some solutions and another related comment

we should try these solutions and also see if currently KKP is scraping these things or not.

Update: Slightly modified version of this HAProxy solution (to just update the ports) works well to address kube-scheduler, controller-manager, kube-proxy AND etcd pods.

dharapvj commented 12 months ago

Note: kps = kube-prometheus-stack; db = dashboard;

Comparison - Grafana

Feature	KKP Implementation	kps implementation	Can it be added in kps?
Loki Integration	Integrated OOTB	Not integrated OOTB	Y
Go Dashboards	There are two dashboard for Go Applications	None	Y
KKP Dashboards	There are six dashboard for KKP specific views	None	Y
k8s Cluster Dashboards	1	2 Its almost similar and contains more info	NA. kps is better
k8s Namespace Dashboards	1	2 pod db is almost similar and contains more info. Workload db gives same info aggregated at STS / Deploy level.	NA. kps is better
k8s Pod Dashboards	2 almost overlapping db	1. pod db is almost similar to KKP db and contains more info. .	NA. kps is better
k8s kubelet Dashboards	2	1	Y. There is relevant information in both KKP and kps dashboards. We should consider merging them.
k8s Networking Dashboards	0	5. Networking related information at differing granularity - clustr, ns, pod, etc	NA. kps is better
k8s Node Dashboards	2 Slightly overlapping Node and Node Resource Usage db	3. The USE Method node is identical with KKP. Rest are different visualizations	Y. We should consider merging KKP and kgs db
k8s Resource usage Dashboards	1 Shows no data?	0	Y
k8s STS Dashboards	1	0	Maybe. I guess, the NS workload db for kps already covers this
k8s Volume Dashboards	1	1	Y find KKP dashboard better. It allows multi select and global aggregation (though not sure how useful that is)
k8s etcd Dashboards	3	1	Maybe. KKP main dashboard does not work presumably due to etcd restrictions which I have worked around via ha-proxy solution in kcp. There is one extra dashboard which shows count of etcd objects. Not sure how useful that is.
Minio Dashboards	1	0	We should pick the latest minio dashboard from Grafana. Its way more feature packed.
Prometheus Dashboards	2	1	Y. I think we should merge the KKP dashboard in kcp dashboard. It has good relevant things to look at. Also blackbox exporter from KKP should be brought into kcp
MLA Dashboards	1	2	Not sure what this dashboard shows in KKP. Currently, it shows nothing for me. kcp has grafana and alertmanager dashboards which should be retained. But grafana dashboard does not show any data currently (misconfigured?)
nginx ingress Dashboards	1	0	We should pick the latest nginx ingress dashboard from Grafana. or similar

dharapvj commented 12 months ago

Note: kps = kube-prometheus-stack; SM=ServiceMonitor

Comparison - Prometheus targets

Goal is to have every single target already scraped by KKP to be also scraped by kps

Target	KKP	kps	If not in kps, mitigation plan	Status
k8s apiserver	Y	Y	NA
cadvisor	Y	Y	NA	In KKP we have `job=cadvisor` in `prometheus/config/scraping` folder. In KPS, we have `job=kubelet` instead. It is auto-configured.
kubelet	Y	Y	NA
kubelet probes	N	Y	NA
kube-state-metrics	Y	Y	NA
kube-controller-manager	N	Y	NA
kube-etcd	N	Y	NA
kube-proxy	N	Y	NA
kube-scheduler	N	Y	NA
kube-prometheus-stack	N	Y	NA
node-exporter	Y	Y	NA
prometheus	Y	Y	NA
pods - CoreDNS	N	Y	NA
pods - osm	Y	N	Write new SM
pods - node-local-dns	Y	N	Write new SM / Enable the SM in helm chart
pods - minio	Y	N	Write new SM / Enable the SM in helm chart
pods - dex	Y	N	Write new SM / Enable the SM in helm chart
pods - promtail	Y	N	Write new SM / Enable the SM in helm chart
pods - kkp seed-ctrl-mgr	Y	N	Write new SM	New PM
pods - nginx-ingress	Y	N	Write new SM / Enable the SM in helm chart	New SM
pods - nodeport-proxy-envoy	Y	N	Write new SM	New SM
pods - cluster-autoscaler	Y	N	Write new SM / Enable the SM in helm chart	New PM
pods - velero	Y	N	Write new SM / Enable the SM in helm chart
pods - Loki	Y	N	Write new SM
pods - kubermatic-api	Y	N	Write new SM
pods - blackbox-exporter	Y	N	Potentially, I have not turned this on.
pods - machine-controller	Y	N	Write new SM
pods - kubermatic-webhook	Y	N	Write new SM
pods - kubermatic-operator	Y	N	Write new SM
pods - kubermatic-master-ctrl-mgr	Y	N	Write new SM
pods - cert-manager	Y	N	Write new SM / Enable the SM in helm chart	New SM
pods - cluster-XXX namespace	Y	N	Write new SM	New SM (federates cluster-xxx prometheus). In future, we should be moving to prometheus agent with `remote_write` and remove the need of service monitor

dharapvj commented 12 months ago

Note: kps = kube-prometheus-stack;

Comparison - Prometheus rules

Goal is to have every single alert rule present in KKP to be available in kps based setup. I think we will need to use jsonnet to add new rules.

General observation: All the rules in kubermatic has a label called service which kind of identifies the source of the alertrule. I think it is a good addition. We should have same in kps rules as well.

Rule category	KKP	kps
Blackbox-exporter	Y	N. Need to implement this additionally.
cert-manager	Y	N
helm-exporter	Y	N
kube-apiserver	Y	Y. kube-apiserver-availability, burn-rate, histogram, slo, We should reconcile.
kubelet	Y	Y. We should reconcile. Also kubernetes-storage rules.
kube-state-metric	Y	Y. We should reconcile. Also, kubernetes-apps, kubernetes-resources rules.
node-exporter	Y	N
prometheus	Y	N
velero	Y	N
VPA	Y	N
kubermatic	Y	N
kubermatic-seed	Y	N
kube-controller-manager	Y	N
kube-scheduler	Y	N
Alertmanager	N	Y. Few alerts are covered in KKP under prometheus set
config-reloaders	N	Y
etcd	N?	Y
general	N?	Y

TO BE CONTINUED

dharapvj commented 12 months ago

Custom dashboards can be added in any NS via a configmap with label grafana_dashboard: "1". Reference

Similarly, custom Alert Rules can be added by creating PrometheusRule CR. It is also possible to add them in values.yaml itself

dharapvj commented 12 months ago

Custom targets can be achieved via writing service monitors.

e.g. ServiceMonitor to query kubermatic-api server:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    release: kube-prometheus-stack
  name: kubermatic-apiserver
  namespace: monitoring
spec:
  selector:
    matchLabels:
      # This label MUST match the label ON THE SVC and not the matchLabel in SVC.
      app.kubernetes.io/name: kubermatic-api
  namespaceSelector:
    matchNames:
    - kubermatic
  endpoints:
  #unfortunately, named port metric does not work .. mostly due to special config of kubermatic-api server.
  - targetPort: 8085

Remember to have labels on the services. e.g. kubermatic-api service does not have specific label on it (apart from managed-by kubuermatic-operator) so then ServiceMonitor will not match the service.

Another thing is we probably need to use podMonitord / relabelconfigs to bring some more extra labels.

In KKP promQL - http_request_duration_seconds_count{app_kubernetes_io_name="kubermatic-api"} brings below series.

But in KPS

promQL http_request_duration_seconds_count{job="kubermatic-api"} brings

So - as you can see, the labels are different so we need some relabel configs in ServiceMonitor endpoint block to match kkp, if needed.

dharapvj commented 12 months ago

Thanos

kps Thanos story is similar to KKP. It allows you to deploy sidecar but does not deploy actual Thanos. We should use Banzaicloud thanos. Some referral help Also see this on configuration of Thanos. This is already covered in blog above as well. Another blog. We should also make use of Thanos community component (still to be contributed).

dharapvj commented 12 months ago

How to ensure that we have parity in terms of kubernetes resources created via KKP monitoring and KPS monitoring?

# create helm output based on heavily configured alertmanager
❯ helm template alertmanager -f ./dev/vj1-master/values.yaml /kubermatic/releases/v2.23.6/charts/monitoring/alertmanager > alertmanager-kkp-render.yaml

# create helm output based on kps with current values.yaml
❯ helm template -n monitoring1 --create-namespace kube-prometheus-stack prometheus-community/kube-prometheus-stack -f values-kube-prometheus-stack.yaml -f values-kube-prometheus-stack-slack-config.yaml > kps-render.yaml

Compare both outputs to know differences in Alertmanager. Then reconcile the differences. Repeat for prometheus and grafana.

dharapvj commented 9 months ago

It is important that we remember to add proper resource and service labels to alerts in target system. This is crucial for handling the alerts via Alerta since Alerta needs resource label to group the alerts.

See https://github.com/kubermatic/kubermatic/pull/7775

dharapvj commented 9 months ago

To bring changes similar to PR https://github.com/kubermatic/kubermatic/pull/7775, we will need to modify existing alerts from kube-prometheus-stack. But, it is not straight forward to override existing alerts from kube-prometheus-stack.

Instead, we can rewrite alert labels to bring desired impact so that kube-prometheus stack alerts plays nice with Alerta (the proper resource gets displayed in resource column alerta instead of kube-state-metric pod's ip.

e.g.

prometheus:
  prometheusSpec:
    # Massage (Relabel) some of the alerts to provide meaningful information
    additionalAlertRelabelConfigs:
    - source_labels: [alertname,namespace, horizontalpodautoscaler]
      regex: "KubernetesHpaMetricAvailability;(.*);(.*)"
      target_label: instance
      replacement: "$1/$2"
    - source_labels: [alertname,namespace,pod,container]
      regex: "KubernetesContainerOomKiller;(.*);(.*);(.*)"
      target_label: instance
      replacement: "$1/$2/$3"

dharapvj commented 9 months ago

Alerta heartbeat plugin can directly use Watchdog alert from kube-prometheus stack via customization env var:

HEARTBEAT_EVENTS: "['Heartbeat', 'Watchdog']"

dharapvj commented 5 months ago

Code is now available in https://github.com/dharapvj/tmp-kube-prometheus-stack (currently private)

dharapvj commented 5 months ago

to configure blackbox-exporter integration with KPS - follow Ref1 Ref2

dharapvj commented 5 months ago

Another thing is we probably need to use podMonitord / relabelconfigs to bring some more extra labels.

Analysis about whether to port extra labels from kkp to KPS targets or not

target	Comment
cert-manager	a. Probably, the service monitor should be create via cert-manager helm chart. b. We see variations in the labels on the service metrics (via kps) and pod metrics (existing kkp). But the extra labels in existing kkp are not really getting used anywhere. There is no cert-manager dashboard and the cert-manager alert rules do not make filtering using the labels. So we can just keep the smaller set of labels generated in KPS

target

Comment

cert-manager

a. Probably, the service monitor should be create via cert-manager helm chart.
b. We see variations in the labels on the service metrics (via kps) and pod metrics (existing kkp). But the extra labels in existing kkp are not really getting used anywhere. There is no cert-manager dashboard and the cert-manager alert rules do not make filtering using the labels. So we can just keep the smaller set of labels generated in KPS

dharapvj commented 5 months ago

Dilemma about maintaining ServiceMonitors in KKP vs letting upstream helm chart create ServiceMonitor

Ideally, we should let upstream helm chart take care of creating ServiceMonitor objects and thereby free us from maintenance of those objects. e.g. cert-manager, minio etc all helm-charts allow us to control creation of ServiceMonitor objects via some values.yaml configuration like prometheus.serviceMonitor.enabled. This would work perfectly fine as long as kps CRDs are installed in the cluster before these helm charts are installed. But e.g. in KKP, we install cert-manager first before installing KPS stack. So unless we somehow get KPS CRDs installed BEFORE running kubermatic-installer, the ServiceMonitor objects will get created without any issues.

Since this precondition of CRD installation is not guaranteed, I think, KKP will need to take responsibility of managing the ServiceMonitor yaml files and changes that might be needed in future.

Both these approaches above have some issues. So we should discuss on which approach is of lesser issues.

dharapvj commented 5 months ago

Both these approaches above have some issues. So we should discuss on which approach is of lesser issues.

We should see how we can get KPS CRDs available before kubermatic-installer installs nginx-ingress-controller OR certmanager helmchart so that we can use upstream service monitors.

dharapvj commented 5 months ago

Result of discussion with Wojciech: The relabeling logic from existing KKP prometheus targets should not be copied without qualifying their need. We should carry forward only the needed relabeling rules.

dharapvj / kubermatic