[bug] Some dashboards are broken on high pods count

dotdc / grafana-dashboards-kubernetes

A set of modern Grafana dashboards for Kubernetes.

Apache License 2.0

2.68k stars 371 forks source link

[bug] Some dashboards are broken on high pods count #129

Closed maxpain closed 2 days ago

maxpain commented 2 weeks ago

Describe the bug

Hello. We have a lot of short-lived pods in our clusters. It's also a problem for frequent CronJobs.

https://github.com/user-attachments/assets/7b9e0606-a46f-45e9-af56-f9721c0f19ea

How to reproduce?

No response

Expected behavior

No response

Additional context

No response

EladAviczer commented 1 week ago

Have you tried increasing the CPU resources allocated to Prometheus?

maxpain commented 1 week ago

@EladAviczer did you watch the video?

EladAviczer commented 1 week ago

*victoriaMetrics

maxpain commented 1 week ago

*victoriaMetrics

The problem is not in Prometheus/VictoriaMetrics, but in the grafana dashboard itself.

EladAviczer commented 1 week ago

The dashboard uses VictoriaMetrics to query the data, you get 422 Unprocessable Content error when calling the promql/metricsQL query.

I don't say that i'm 100% sure that it is a VictoriaMetrics problem but it could be and you should check it too. : )

maxpain commented 1 week ago

The dashboard uses VictoriaMetrics to query the data, you get 422 Unprocessable Content error when calling the promql/metricsQL query.

The problem is that there are a lot of pods (because CronJob running every minute), and this dashboard tries to pass the array of pod names (1440 pods for last 24 hours), which will fail on any installation (Prometheus or VictoriaMetrics)

dotdc commented 6 days ago

Hi @maxpain,

The created_by variable was introduced to enable filtering on deployments, but if there are too many pods, you'll up end with a 422 Unprocessable Entity error as you just experienced.

I’ll check if there’s a better solution, but removing the created_by variable might work better in your case.

hagen1778 commented 4 days ago

I’ll check if there’s a better solution, but removing the created_by variable might work better in your case.

Could just setting Custom all value to wildcard for this variable help?

Then, when all is selected, dashboard won't try sending a request with thousands options - it would just send pod=~".*" instead. This would be helpful for any type of datasource there.

dotdc commented 2 days ago

:tada: This issue has been resolved in version 2.5.2 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

dotdc commented 2 days ago

@maxpain Should be better with dashboards released in v2.5.3.

maxpain commented 1 day ago

thanks!