Closed sebastiangaiser closed 2 months ago
@paullaffitte @npdgm I saw you in the history and assume that you're developers of this project. If this is the case, could you please mind reviewing my PR
Hi @sebastiangaiser, thanks for opening this discussion!
First, know that I'm well aware of the two competing opinions on whether CPU limits are to be used on k8s or not. For example the article from Robusta and the responses it got, and talks. Personally I see good arguments on both sides and remove limits for apps on a case by case basis only (on Prometheus exporters, would most likely bump the limit but never remove it).
Regarding the exporter it's true that our default CPU limits are conservative and can slow down scrapping for clusters with lots of Secrets. It could be increased but the fact is hunting for a sensible default value that suits every cluster is an impossible task. I believe default values should be safe for users that don't want to dig into configuration too much, and most would not have to with the current ones. Tuning CPU limits for this exporter mostly depends on how many TLS Secrets a cluster has. It seems fair to expect a user to perform adjustments when it's needed for a given platform.
Now I recon this is also making an argument for dropping the limit! :smiley: So here are two complementary thoughts:
resources.limits.cpu
. That's a lot to ask for and would seem obscure to many. That's why we followed the steps of many Go projets and used automaxprocs to auto-tune GOMAXPROCS to the cgroup CPU quota, which itself is derived from resources.limits.cpu
in k8s.
Removing the default limit for CPU would let the Go runtime take bad decisions on large nodes, because automaxprocs can't work anymore and will default to hardware threads. So we would be back to searching for a sensible default of GOMAXPROCS to code as a default, which is even harder to do and then explain and document.Having said that, it doesn't mean we can do nothing about default CPU limits. It could be increased, or even removed if GOMAXPROCS is handled an other way...
I would very much like to hear arguments on this change. Perhaps you had bad experiences with limits on the exporter? Would you mind sharing about your usage such as number of Secrets, and how much CPU is consumed when generating metrics? Did you tune API client throttling too? Scrape timeout?
Cheers
@npdgm thank you for your very detailed explanation. At least I was not aware that x509-certificate-exporter is handling the GOMAXPROCS
in that way. My problem were that the limits are to low but that's something I can solve on my own.
Is this somewhere documented already?
You're welcome. I discussed this topic with coworkers who deploy this exporter a lot and we came to an agreement that default CPU limits are indeed too low. But there should be one by default. So I think we can raise it significantly and not cause issues for anyone. Will update the PR.
As for GOMAXPROCS
, no it's not documented in this repository. That's a Go thing and I rarely see it mentionned by projects unless they do much processing. It's not that important, don't worry about it! In fact that's more of a problem the other way, when containers have low CPU limits and do not (auto)tune GOMAXPROCS. Which is the problem we had in the past. I mixed up thoses scenarios a bit in my explanation.
Thank you 🚀
:tada: This PR is included in version 3.15.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
…sExporter