Open rfairburn opened 13 hours ago
Additional thought. Do we as a practice need to be passing in GOMEMLIMIT
assuming that Go is not automatically detecting the right memory amount and therefore trying to allocate because it thinks it can? Might help improve garbage collection and prevent this.
Fleet version: v4.56.0 (but also at least the previous 2 versions)
Web browser and operating system: N/A
💥 Actual behavior
Fleet container crashed when 6GB allocated.
Details:
The APM results show increased container CPU and request latency around the time of the issue. There are a number of http requests around the time of the issue to retrieve hosts in batches of 100 that were sending over 100MB of data each. It is possible that these could be related. See private slack thread https://fleetdm.slack.com/archives/C03EG80BM2A/p1726910791814149?thread_ts=1726909537.718839&cid=C03EG80BM2A for details (Contains customer data so not pasted here)
🧑💻 Steps to reproduce
Unknown, but it appears that iterating over hosts (85-100k) with a lot of software and grabbing that could be at least partially at play.
🕯️ More info (optional)
I haven't been able to isolate this to a specific API call. It does not currently appear to be uploading software installers.
Hopefully this can be reproduced in load testing, even if it takes using 4GB instead of 6 (as we recommend 4GB in all cases per container).