fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.94k stars 409 forks source link

Fleet can run out of memory even with 6GB Allocated #22291

Open rfairburn opened 13 hours ago

rfairburn commented 13 hours ago

Fleet version: v4.56.0 (but also at least the previous 2 versions)

Web browser and operating system: N/A


💥  Actual behavior

Fleet container crashed when 6GB allocated.

Details:

image

image

image

image

The APM results show increased container CPU and request latency around the time of the issue. There are a number of http requests around the time of the issue to retrieve hosts in batches of 100 that were sending over 100MB of data each. It is possible that these could be related. See private slack thread https://fleetdm.slack.com/archives/C03EG80BM2A/p1726910791814149?thread_ts=1726909537.718839&cid=C03EG80BM2A for details (Contains customer data so not pasted here)

🧑‍💻  Steps to reproduce

Unknown, but it appears that iterating over hosts (85-100k) with a lot of software and grabbing that could be at least partially at play.

🕯️ More info (optional)

I haven't been able to isolate this to a specific API call. It does not currently appear to be uploading software installers.

Hopefully this can be reproduced in load testing, even if it takes using 4GB instead of 6 (as we recommend 4GB in all cases per container).

rfairburn commented 13 hours ago

Additional thought. Do we as a practice need to be passing in GOMEMLIMIT assuming that Go is not automatically detecting the right memory amount and therefore trying to allocate because it thinks it can? Might help improve garbage collection and prevent this.