Huge spike on CPU and other resources when couchdb-prometheus-exporter is set database=_all_dbs with 2600 databases

gesellix / couchdb-prometheus-exporter

CouchDB stats exporter for Prometheus

MIT License

56 stars 20 forks source link

Huge spike on CPU and other resources when couchdb-prometheus-exporter is set database=_all_dbs with 2600 databases #259

Open Sdas0000 opened 9 months ago

Sdas0000 commented 9 months ago

Since frequency for matrices collections is set to 1 min , couchdb-prometheus-exporter attempting to collect information for all 2600 databases , this impacts the performance of the cluster. Is there a way to collect database information sequentially or by a batch size ? Can we add a parameter to collect database information frequency ( may be every 6 hours or 12 hours etc ) ?

gesellix commented 9 months ago

I think we'll have to change the collector to continuously (with configurable frequency) perform scrapes across the databases. Just like you suggested in your laster question. This might not be a quick fix, though, I'll have to check.

You might work around the issue by running multiple couchdb-prometheus-exporter instances and configuring each for only a subset of your databases. The Prometheus configuration would then have to scrape all those exporters, obviously. This is only a workaround.

gesellix commented 9 months ago

@Sdas0000 please have a look at the database.concurrent.requests parameter as introduced with https://github.com/gesellix/couchdb-prometheus-exporter/pull/46. It allows to limit the concurrent requests between exporter and CouchDB cluster, which might help for your environment.

Nevertheless I'm going to implement an option to decouple Prometheus' scrape interval (Prometheus -> Exporter) and the exporter's scrape interval (Exporter -> CouchDB). Beware that this might have the undesired effect of collecting stale metrics.

gesellix commented 9 months ago

I just released v30.9.0 with a new flag to perform scrapes at a configurable interval independent of Prometheus scrapes. Example: --scrape.interval=6h for an interval of 6 hours (default is 0s).

Please leave some feedback and whether you need more optimization for your setup. Thanks!

gesellix commented 8 months ago

Closing now, feel free to leave feedback here or file another issue in case you still run into performance issues.

Sdas0000 commented 8 months ago

Scrap.interval is scraping all for that duration, our issue is _all_dbs (2600 databases) at same time , we are looking for database scraping interval

gesellix commented 8 months ago

I think you should give the option described in https://github.com/gesellix/couchdb-prometheus-exporter/issues/259#issuecomment-1773848416 a try. This would allow you to define "buckets" for requests to you cluster. Did you have a look at that option?

Sdas0000 commented 8 months ago

We tried database.concurrent.requests = 100 , but that didn't help , we still see same high CPU , what we are looking scrape.interval for specific to database level maxtrix ( like doc count , disk utilization etc ) and other matirix can continue as usual. also if we can have a parameter like "database scrape batch size" which will scrape only that batch and after finish first batch it will pick up next batch , in this case it may use less resource. Basically we need disk , doc count etc only few times a day , but other information we need continuously throughout a day

gesellix commented 8 months ago

I think I need to reproduce the issue for myself... monitoring 2600 databases... and then trying to make it work using less resources. For the time being I don't have a better suggestion than above https://github.com/gesellix/couchdb-prometheus-exporter/issues/259#issuecomment-1773027402, deploying multiple exporter instances, each dedicated for a specific range of databases.