appuio / appuio-cloud-reporting

Reporting for APPUiO Cloud
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

`appuio_cloud_memory` query fails on on unrelated node label changes #104

Closed bastjan closed 1 year ago

bastjan commented 1 year ago

Description

There's a short overlap of kube_node_labels series, which crashes the queries with many-to-many errors. If there is a label change. Since it takes 5 minutes for prometheus to mark a series as non-active. We tried fixing it for query related labels in #99 but this does not work on unrelated label changes.

Screenshot 2022-12-02 at 16 32 37

Additional Context

No response

Logs

❯ oc -n appuio-cloud-reporting  --as=cluster-admin debug job/backfill-appuio-cloud-memory-27831795 -- appuio-cloud-reporting report --query-name=appuio_cloud_memory --begin 2022-12-01T12:00:00Z
Starting pod/backfill-appuio-cloud-memory-27831795-debug, command was: sh -c appuio-cloud-reporting report --begin=$(date -d "now -3 hours" -u +"%Y-%m-%dT%H:00:00Z") --repeat-until=$(date -u -Iseconds) --query-name=appuio_cloud_memory
2022-12-02T15:31:05.227Z | INFO | appuio-cloud-reporting | appuio-cloud-reporting/logger.go:40 | Starting up appuio-cloud-reporting | {"version": "0.7.0", "date": "2022-12-02", "commit": "f312391a75667650860045401cfb59c7fa1a6585", "go_os": "linux", "go_arch": "amd64", "go_version": "go1.19.1", "uid": 65536, "gid": 0}
2022-12-02T15:31:05.238Z | INFO | appuio-cloud-reporting | appuio-cloud-reporting/report_command.go:122 | Running report...
2022-12-02T15:31:06.849Z | ERROR | appuio-cloud-reporting | v2@v2.19.2/app.go:618 | fatal error | {"error": "failed to run query 'appuio_cloud_memory' at '2022-12-01T12:00:00Z': failed to query prometheus: execution: found duplicate series for the match group {node=\"flex-7b9a\"} on the right hand-side of the operation: [{__name__=\"kube_node_labels\", cluster_id=\"c-appuio-cloudscale-lpg-2\", container=\"kube-rbac-proxy-main\", endpoint=\"https-main\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_os=\"linux\", label_csi_cloudscale_ch_zone=\"lpg1\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"flex-7b9a\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhcos\", namespace=\"openshift-monitoring\", node=\"flex-7b9a\", prometheus=\"openshift-monitoring/k8s\", receive=\"true\", service=\"kube-state-metrics\", tenant_id=\"c-appuio-cloudscale-lpg-2\"}, {__name__=\"kube_node_labels\", cluster_id=\"c-appuio-cloudscale-lpg-2\", container=\"kube-rbac-proxy-main\", endpoint=\"https-main\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_os=\"linux\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"flex-7b9a\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhcos\", namespace=\"openshift-monitoring\", node=\"flex-7b9a\", prometheus=\"openshift-monitoring/k8s\", receive=\"true\", service=\"kube-state-metrics\", tenant_id=\"c-appuio-cloudscale-lpg-2\"}];many-to-many matching not allowed: matching labels must be unique on one side"}

Expected Behavior

Working appuio_cloud_memory query.

Steps To Reproduce

No response

Versions

unrelated to version

bastjan commented 1 year ago

I tried using min by which does work on it's own but timeouts in the full query https://github.com/appuio/appuio-cloud-reporting/pull/105.

bastjan commented 1 year ago

While on it we can also fix kube_persistentvolume_info duplicates after platform upgrades: Screenshot 2022-12-06 at 15 06 03

bastjan commented 1 year ago

Memory query fixed in #105.