wendorf commented 1 year ago

In the cluster I'm trying kptop with, we are filtering which cadvisor metrics we scrape. To get kptop working correctly, I need to run kptop's commands, see what graphs populate and don't populate, check the source code to see which metrics the graphs are looking at, then update the list of scraped metrics.

I would love if --verify-prometheus returned a list of metrics that were missing, so I could more-easily update the allowlist in my scrape config.

It was easy to determine that I needed machine_cpu_cores, since kptop nodes fails noisily:

# kptop nodes
No nodes found
Query did not return any data: machine_cpu_cores

However, for the dashboards, I need to dig through the logfile. It would be nice if all the failing queries were presented at once.

eslam-gomaa commented 1 year ago

@wendorf what about doing something like this

kptop nodes --check-metrics

kptop pods --check-metrics

it should print a table with all the metrics used for pods or nodes (includes the metrics used for graphs)

Metric	status	Comment
Metric_1	🟢 available
Metric_2	🟡 not_available	...
Metric_3	🟢 available

update:

I think this is better

--verify-prometheus --check-metrics 
# in addition to Prometheus connection verification output, 
# it will print a table with all the metrics that kptop uses and and will show which metrics is missing.

eslam-gomaa commented 1 year ago

Done, --verify-prometheus --check-metrics will be available in the next release "v0.0.4"

16

wendorf commented 1 year ago

Awesome! Thank you!

eslam-gomaa / kptop

`--verify-prometheus` should check metrics availability #15

16