Missing event metadata

imreczegledi-form3 commented 2 weeks ago

Hi 👋

We have some false positive alerts on empty events, similar to https://github.com/falcosecurity/falco/issues/3234, https://github.com/falcosecurity/falco/issues/2700 (hope I can help in these cases as well)

almost everything is null, -1 or 4294967295

{"hostname":"minikube","output":"12:15:24.058348969: Warning Account Manipulation in SSH detected 
...
{"container.id":"host","container.image.repository":null,"container.image.tag":null,"container.name":"host","evt.res":"SUCCESS","evt.time":1718108124058348969,"evt.type":"openat","fd.name":"my_sshd_config","group.gid":4294967295,"group.name":"","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"bash","proc.cwd":"","proc.exepath":"","proc.pcmdline":null,"proc.pid":11453,"proc.ppid":0,"proc.sid":-1,"user.loginname":"","user.loginuid":-1,"user.name":"","user.uid":4294967295}
...
}

Falco rule: Account Manipulation in SSH, but the issue is not rule specific.

Based on my local tests, the root cause is the too small bufSizePreset parameter. This buffer is crucial when Falco has to a handle a "process flood" (e.g. a process makes hundreds of child processes).

To simulate a "process flood" I created a small golang script which triggers the rule 1000 times in different child processes (on the host).

...
cmd = exec.CommandContext(ctx, "timeout", "5s", "tail", "-f", "/home/ubuntu/my_sshd_config")
...

This is the way how you can reproduce the issue.

Test env

EC2 (Ubuntu, t3.small) with minikube
Deployed Falco chart version 4.3.0

Results

`bufSizePreset`	Logged events	Events with missing metadata	Ratio
1 (1 MB)	447	77	0,172
2 (2 MB)	582	69	0,118
3 (3 MB)	570	48	0,084
4 (4 MB) - Falco default	738	41	0,055
5 (16 MB)	998	0	-

As you can see above as we increase the buffer, the number of the events without metadata is decreasing. When we use a buffer with appropriate size the issue disappears, Falco logs will contain only appropriately enriched events.

bufSizePreset can be between 1-10

Ideas

Probably, the event enrichment logic uses some space from the bufSizePreset buffer
Due to the huge load (because of the new processes) event enrichment doesn't have enough space to work
Maybe dropping these "empty" events would be better
A bufSizePreset specific debug message (with a logic which can measure the buffer utilisation) would be very useful

Looking forward to your answer, ideas (I might have missed something)

incertum commented 2 weeks ago

As you can see above as we increase the buffer, the number of the events without metadata is decreasing.

This is expected as Falco builds up internal state to serve you all the information (see the source code https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/parsers.cpp). If we drop too many events kernel side, the state engine is not working. Perhaps the adaptive syscalls blog post (https://falco.org/blog/adaptive-syscalls-selection/) can provide more insights, and the base_syscalls feature may be of interest to you in general.

A bufSizePreset specific debug message (with a logic which can measure the buffer utilisation) would be very useful

Have you explored the internal automatic drop alerts or Falco metrics https://falco.org/docs/metrics/falco-metrics/ as alternative? Both expose drop counters from which you can infer how the buffer is holding up.

Some more general info:

Btw, in your example log is shows "container.name":"host" so all container fields are expected to be null, see https://falco.org/docs/reference/rules/supported-fields/#field-class-container etc

re user names and group names, is the host /etc dir mounted and available? We have had issues in the past with minikube support in general as some mounts or setup is not like on actual Kubernetes. Perhaps some of it is also because of that. How do you use minikube? Which driver? See also https://falco.org/docs/install-operate/third-party/learning/

imreczegledi-form3 commented 2 weeks ago

Thanks, I will check the blog post regarding adaptive syscalls.

driver: modern-bpf

I think it isn't a minikube compatibility issue because as you can see in the table above. Majority of the events are perfectly enriched like:

{"hostname":"minikube","output":"13:21:03.940424837: Warning Account Manipulation in SSH detected ...
 "output_fields": {"container.id":"host","container.image.repository":null,"container.image.tag":null,"container.name":"host","evt.res":"SUCCESS","evt.time":1718198463940424837,"evt.type":"openat","fd.name":"/home/ubuntu/my_sshd_config","group.gid":1001,"group.name":"<NA>","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"tail -f /home/ubuntu/my_sshd_config","proc.cwd":"","proc.exepath":"/usr/bin/tail","proc.pcmdline":"timeout 5s tail -f /home/ubuntu/my_sshd_config","proc.pid":12194,"proc.ppid":12192,"proc.sid":-1,"user.loginname":"docker","user.loginuid":1000,"user.name":"docker","user.uid":1000}}

so the root cause remains around the state engine/dropped events

falcosecurity / falco

Missing event metadata #3246

Missing event metadata

Ideas