falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.09k stars 876 forks source link

Missing event metadata #3246

Open imreczegledi-form3 opened 2 weeks ago

imreczegledi-form3 commented 2 weeks ago

Hi 👋

We have some false positive alerts on empty events, similar to https://github.com/falcosecurity/falco/issues/3234, https://github.com/falcosecurity/falco/issues/2700 (hope I can help in these cases as well)

Missing event metadata

{"hostname":"minikube","output":"12:15:24.058348969: Warning Account Manipulation in SSH detected 
...
{"container.id":"host","container.image.repository":null,"container.image.tag":null,"container.name":"host","evt.res":"SUCCESS","evt.time":1718108124058348969,"evt.type":"openat","fd.name":"my_sshd_config","group.gid":4294967295,"group.name":"","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"bash","proc.cwd":"","proc.exepath":"","proc.pcmdline":null,"proc.pid":11453,"proc.ppid":0,"proc.sid":-1,"user.loginname":"","user.loginuid":-1,"user.name":"","user.uid":4294967295}
...
}

Falco rule: Account Manipulation in SSH, but the issue is not rule specific.

Based on my local tests, the root cause is the too small bufSizePreset parameter. This buffer is crucial when Falco has to a handle a "process flood" (e.g. a process makes hundreds of child processes).

To simulate a "process flood" I created a small golang script which triggers the rule 1000 times in different child processes (on the host).

...
cmd = exec.CommandContext(ctx, "timeout", "5s", "tail", "-f", "/home/ubuntu/my_sshd_config")
...

This is the way how you can reproduce the issue.

Test env

Results

bufSizePreset Logged events Events with missing metadata Ratio
1 (1 MB) 447 77 0,172
2 (2 MB) 582 69 0,118
3 (3 MB) 570 48 0,084
4 (4 MB) - Falco default 738 41 0,055
5 (16 MB) 998 0 -

As you can see above as we increase the buffer, the number of the events without metadata is decreasing. When we use a buffer with appropriate size the issue disappears, Falco logs will contain only appropriately enriched events.

bufSizePreset can be between 1-10

Ideas

Looking forward to your answer, ideas (I might have missed something)

incertum commented 2 weeks ago

As you can see above as we increase the buffer, the number of the events without metadata is decreasing.

This is expected as Falco builds up internal state to serve you all the information (see the source code https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/parsers.cpp). If we drop too many events kernel side, the state engine is not working. Perhaps the adaptive syscalls blog post (https://falco.org/blog/adaptive-syscalls-selection/) can provide more insights, and the base_syscalls feature may be of interest to you in general.

A bufSizePreset specific debug message (with a logic which can measure the buffer utilisation) would be very useful

Have you explored the internal automatic drop alerts or Falco metrics https://falco.org/docs/metrics/falco-metrics/ as alternative? Both expose drop counters from which you can infer how the buffer is holding up.


Some more general info:

Btw, in your example log is shows "container.name":"host" so all container fields are expected to be null, see https://falco.org/docs/reference/rules/supported-fields/#field-class-container etc

re user names and group names, is the host /etc dir mounted and available? We have had issues in the past with minikube support in general as some mounts or setup is not like on actual Kubernetes. Perhaps some of it is also because of that. How do you use minikube? Which driver? See also https://falco.org/docs/install-operate/third-party/learning/

imreczegledi-form3 commented 2 weeks ago

Thanks, I will check the blog post regarding adaptive syscalls.


driver: modern-bpf

I think it isn't a minikube compatibility issue because as you can see in the table above. Majority of the events are perfectly enriched like:

{"hostname":"minikube","output":"13:21:03.940424837: Warning Account Manipulation in SSH detected ...
 "output_fields": {"container.id":"host","container.image.repository":null,"container.image.tag":null,"container.name":"host","evt.res":"SUCCESS","evt.time":1718198463940424837,"evt.type":"openat","fd.name":"/home/ubuntu/my_sshd_config","group.gid":1001,"group.name":"<NA>","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"tail -f /home/ubuntu/my_sshd_config","proc.cwd":"","proc.exepath":"/usr/bin/tail","proc.pcmdline":"timeout 5s tail -f /home/ubuntu/my_sshd_config","proc.pid":12194,"proc.ppid":12192,"proc.sid":-1,"user.loginname":"docker","user.loginuid":1000,"user.name":"docker","user.uid":1000}}

so the root cause remains around the state engine/dropped events