enix / x509-certificate-exporter

A Prometheus exporter to monitor x509 certificates expiration in Kubernetes clusters or standalone
MIT License
631 stars 64 forks source link

fix(deploy,chart): remove cpu limits for secretsExporter and hostPath… #306

Closed sebastiangaiser closed 2 months ago

sebastiangaiser commented 2 months ago

…sExporter

sebastiangaiser commented 2 months ago

@paullaffitte @npdgm I saw you in the history and assume that you're developers of this project. If this is the case, could you please mind reviewing my PR

npdgm commented 2 months ago

Hi @sebastiangaiser, thanks for opening this discussion!

First, know that I'm well aware of the two competing opinions on whether CPU limits are to be used on k8s or not. For example the article from Robusta and the responses it got, and talks. Personally I see good arguments on both sides and remove limits for apps on a case by case basis only (on Prometheus exporters, would most likely bump the limit but never remove it).

Regarding the exporter it's true that our default CPU limits are conservative and can slow down scrapping for clusters with lots of Secrets. It could be increased but the fact is hunting for a sensible default value that suits every cluster is an impossible task. I believe default values should be safe for users that don't want to dig into configuration too much, and most would not have to with the current ones. Tuning CPU limits for this exporter mostly depends on how many TLS Secrets a cluster has. It seems fair to expect a user to perform adjustments when it's needed for a given platform.

Now I recon this is also making an argument for dropping the limit! :smiley: So here are two complementary thoughts:

Having said that, it doesn't mean we can do nothing about default CPU limits. It could be increased, or even removed if GOMAXPROCS is handled an other way...

I would very much like to hear arguments on this change. Perhaps you had bad experiences with limits on the exporter? Would you mind sharing about your usage such as number of Secrets, and how much CPU is consumed when generating metrics? Did you tune API client throttling too? Scrape timeout?

Cheers

sebastiangaiser commented 2 months ago

@npdgm thank you for your very detailed explanation. At least I was not aware that x509-certificate-exporter is handling the GOMAXPROCS in that way. My problem were that the limits are to low but that's something I can solve on my own.

Is this somewhere documented already?

npdgm commented 2 months ago

You're welcome. I discussed this topic with coworkers who deploy this exporter a lot and we came to an agreement that default CPU limits are indeed too low. But there should be one by default. So I think we can raise it significantly and not cause issues for anyone. Will update the PR.

As for GOMAXPROCS, no it's not documented in this repository. That's a Go thing and I rarely see it mentionned by projects unless they do much processing. It's not that important, don't worry about it! In fact that's more of a problem the other way, when containers have low CPU limits and do not (auto)tune GOMAXPROCS. Which is the problem we had in the past. I mixed up thoses scenarios a bit in my explanation.

sebastiangaiser commented 2 months ago

Thank you 🚀

monkeynator commented 2 weeks ago

:tada: This PR is included in version 3.15.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: