Closed kokes closed 5 months ago
Note that I have not run the test suite, because it fails to run not just for this PR, but also for main (I guess my environment is somehow broken).
Can you please run it, @npdgm? Thank you
Oh never mind, I got the tests to run against a fresh kind cluster and they pass.
Hi! Thank you very much for you work! Your explanations are well detailed and the optimization is clever. I think it may be a bit dangerous in a maintainability perspective, in a sense that if one day we modify the getLabels
function, we will have to remember to update the compareCertificates
as well. But I would say that the benefits are worth the risk. Anyway, I will try to find a way to tackle this issue later, but for now I will merge your work.
Thanks again! :rocket:
:tada: This PR is included in version 3.13.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Hey, similar to some other issues submitted recently, we faced OOM kills when running the exporter with a high number of k8s secrets. This PR seems to resolve #255 (the issue contains a reproducer).
While I thought the issue was the extremely memory intensive method
getLabels
, it was in a way just a part of the issue. While this method does allocate heavily, the main problem is that the exporter itself is slow. And since it doesn't have a context, it is not cancelled and many concurrect collections happen, all allocating and the process slowly gets OOM killed as memory mounts.The new codebase allows for much faster parsing of certs (or, more specifically, their deduplication). This then makes collection pretty much instant and no memory gets accumulated.
Here's how this looked and how it looks now with the fix:
High level perf stats
Original memory profile (when just parsing certs):
New memory profile:
Benchmark comparison (before/after):
Changes made
trimComponents
function that exits early in case we don't trim components (and thus don't allocate a slice of strings, dopath.Join
etc.)fmt.Sprintf
in the metric construction to save ourselves an allocationgetLabels
calls in the deduplication path - and inlined all the comparisons without allocating amap
- this is the meat of this PRI have some other changes drafted, but they have relatively low impact. Here are some numbers on the
trimComponents
part, which was perhaps the biggest offendor after thegetLabels
stuff was sorted.