GoogleCloudPlatform / prometheus-test-infra

Apache License 2.0
11 stars 3 forks source link

Adding a cleanup logic to delete "old" nodes in the prombench cluster #17

Open saketjajoo opened 1 year ago

saketjajoo commented 1 year ago

Having prombench continuously benchmark a Prometheus instance not only increases cloud costs, but may not actually be required for some of the PRs (depending on what exactly has changed in that PR). Instead of having the benchmark run indefinitely (until someone cancels it explicitly), we can have a K8s cron job that checks for "old" nodes and deletes those periodically. This way, resources are not wasted and the cloud costs can be kept under control.

There can be a default time-period which can be changed (maybe via a flag / environment variable / etc.) based on which the cron job checks the age of the nodes.

The only (potential) drawback is that the cron job lives on the main node forever (unless someone explicitly delete it or runs /prombench cancel).

Example: https://github.com/GoogleCloudPlatform/prometheus-test-infra/commit/0c11b1249e288e590fc4361ecd3e46296a02242d