Closed ciprian2k closed 2 months ago
Hi! Thanks for opening this issue! So, it seems there might be a memleak when the rule triggers. Can you test the same with latest Falco 0.38.1? Thank you very much for reporting!
Also, in case it is still present, can you share the configuration too? Or you are using the default one?
So, after
Events detected: 7921227 Rule counts by severity: WARNING: 7921227 Triggered rules by rule name: Suspicious Command Args Detected: 7921227
I see a +8M increase in resident memory:
160604 root 20 0 2471164 214944 193440 S 26,2 0,3 0:11.68 falco
160604 root 20 0 2479436 222784 193440 S 30,8 0,3 11:38.71 falco
We got a problem, Houston. But not that big, at least here.
EDIT: going to run with valgrind massif tool to check if we can easily spot the leak!
Ok on a second thought ,considering that i am running
watch -n 0.1 "echo --lua-exec"
i'd expect around 10 events per-second that means 36k events per-hour. How could i reach 8 millions events in like 30minutes :rofl:
Hi @FedeDP,
Thanks for investigating my problem. I've tested now on Falco 0.38.1 and it has the same issue.
Digging more into the problem, I found out that the memory leak is because I have http_output enabled.
http_output: enabled: true url: http://samplemywebsite.com/api/falco
This is the only difference in configuration vs the default one.
I confirm I can reproduce the memory leak. I used the exact rule and a pod running with while true; do echo "-- lua-exec
; done`.
The memory usage increases til an OOM:
- containerID: containerd://bc51e480adba8a724c297ca9481c6d463c2f0cf556bf61bc37e1af77cf7d6686
image: docker.io/falcosecurity/falco-no-driver:0.38.1
imageID: docker.io/falcosecurity/falco-no-driver@sha256:a59cadbaf556c05296dfc8f522786b2138404814797ffbc9ee3b26b336d06903
lastState:
terminated:
containerID: containerd://9e7c69c0b51f9c8a014a35a1b2adfa11277fc3a188e65f04e0f09ef4c2238b9e
exitCode: 137
finishedAt: "2024-07-02T11:02:50Z"
reason: OOMKilled
startedAt: "2024-07-02T10:37:23Z"
I will test without the http_output.enabled=true
.
I confirm the leak disappears once the http_output
is disabled:
Thank you both very much! I will give it a look and report back :)
Out of curiosity, which libcurl version are you using? The bundled one or the system one?
EDIT: Anyway, i am able to reproduce by enabling http output
So, it seems like there is something wrong in the curl_easy_perform
call here: https://github.com/falcosecurity/falco/blob/master/userspace/falco/outputs_http.cpp#L118
Since commenting it fixes the issue (well, http output does nothing then). I am still digging!
So, i tried to repro this with a minimal libcurl-only example but couldn't.
Then, i remembered that our outputs queue is unbounded by default and it means that it grows indefinitely; the rule you provided does not specify any syscall thus it matches every syscall/action made by the process called with args --lua-exec
, that's why it generates so many output events.
TLDR: setting outputs_queue.capacity
to eg: 100 in Falco config fixes the "issue".
But please mind that this is not an issue, it is by design behavior, exacerbated by the very wide condition of the rule.
Hi, You are right, setting outputs_queue capacity in config resolves "my problem". Didn't know why the memory was increasing, really thought it was a memory leak.
Thank you again for your help and sorry for the time spent on this matter.
No problem sir, thanks for asking! /milestone 0.39.0
Hi @FedeDP , sry for bring this up, I met the same issue on 0.38.0, but may I know that if I set outputs_queue.capacity to some fixed value, does it mean falco will drop some events if cap is met? If yes, do we have some other options to mitigate this OOM issue? The diff of our env is that we have many network traffic incoming/outcoming
does it mean falco will drop some events if cap is met
Yes, exactly.
If yes, do we have some other options to mitigate this OOM issue?
Unfortunately no; well if your system is generating too many events perhaps some rule is too noisy and must be stricter.
does it mean falco will drop some events if cap is met
Yes, exactly.
If yes, do we have some other options to mitigate this OOM issue?
Unfortunately no; well if your system is generating too many events perhaps some rule is too noisy and must be stricter.
got it thanks for answering. do we have any metrics we can use to monitor when a fixed value is chosen? i read https://falco.org/docs/metrics/falco-metrics/ but having a hard time to understand what metric's meaning actually, like falcosecurity_scap_n_retrieve_evts_drops_total and falcosecurity_scap_n_store_evts_drops_total, the difference between it and etc
Describe the bug
Falco memory usage keeps increasing until OOM
How to reproduce it
Create a custom rule "command_args.yaml"
Run echo multiple times and see memory increase until OOM
Screenshots
Environment
System info:
Cloud provider or hardware configuration:
OS:
Kernel:
Installation method: