Producing metrics for a large number of instances takes too long (something like 25 seconds for 100+ instances).
As a result promethues times out before cloud scanner returns metrics and we see no data in the dashboard (nor in prometheus).
Solution
As a short terme workaround we can increase the scrape_timeout in prometheus config. The default is 10 second, we could include an example of setting the timeout to 60 seconds.
Also needs to be mentionned in the docs.
Long term solution is to optimize the way we gather data and return metrics but this is another story #392
Alternatives
Additional context or elements
This condition can be detected in the prometheus UI, by checking the status / targets page which returns details about scrape time for different targets.
Problem
Producing metrics for a large number of instances takes too long (something like 25 seconds for 100+ instances). As a result promethues times out before cloud scanner returns metrics and we see no data in the dashboard (nor in prometheus).
Solution
As a short terme workaround we can increase the
scrape_timeout
in prometheus config. The default is 10 second, we could include an example of setting the timeout to 60 seconds.Also needs to be mentionned in the docs.
Long term solution is to optimize the way we gather data and return metrics but this is another story #392
Alternatives
Additional context or elements
This condition can be detected in the prometheus UI, by checking the status / targets page which returns details about scrape time for different targets.