dotdc / grafana-dashboards-kubernetes

A set of modern Grafana dashboards for Kubernetes.
Apache License 2.0
2.63k stars 368 forks source link

suggest lower cardinality variables for the pod dashboard[bug] #106

Closed brokenjacobs closed 6 months ago

brokenjacobs commented 6 months ago

Describe the bug

When in a cluster with a lot of churn on pods, the high cardinality pod metrics cause queries to fail due to the large number of series returns. For instance I doubled the max returned label sets in victoriametrics to 60k and I still fail when trying to use the pod dashboard:

2024-04-22T18:17:33.527Z    warn    VictoriaMetrics/app/vmselect/main.go:231    error in "/api/v1/series?start=1713806220&end=1713809880&match%5B%5D=%7B__name__%3D%22kube_pod_info%22%7D": cannot fetch time series for "filters=[{__name__=\"kube_pod_info\"}], timeRange=[2024-04-22T17:17:00Z..2024-04-22T18:18:00Z]": cannot find metric names: error when searching for metricIDs in the current indexdb: the number of matching timeseries exceeds 60000; either narrow down the search or increase -search.max* command-line flag values at vmselect; see https://docs.victoriametrics.com/#resource-usage-limits

How to reproduce?

Have a cluster with a lot of pods being created...

Expected behavior

No response

Additional context

I have a fix suggestion that seems to work fine for me. It involves changing the namespace and job queries to not query "all pods" for labels. Like this:

namespace: label_values(kube_namespace_created{cluster="$cluster"},namespace)
job: label_values(kube_pod_info{namespace="$namespace", cluster="$cluster"},job)
dotdc commented 6 months ago

Thank you for the bug report @brokenjacobs, will have a look at it by the end of the week!

dotdc commented 6 months ago

:tada: This issue has been resolved in version 1.1.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

dotdc commented 6 months ago

@brokenjacobs Should be fixed https://github.com/dotdc/grafana-dashboards-kubernetes/commit/75dd5a15c11c5d3608c538af2888881ea6ca51d5, let me know if it's not the case.