falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.27k stars 895 forks source link

Falco "syscall event drop" #2657

Closed shalevpenker97 closed 8 months ago

shalevpenker97 commented 1 year ago

Describe the bug

When deploying Falco on Kubernetes we can see drop of syscalls, but it takes time for the falco pod to start dropping syscalls event , when it start dropping the event it doesnt stop until the pod is restarted, there is no different behavior to the pods running on Kubernetes in term on syscalls.

How to reproduce it

Deploy Falco at scale with these configuration:

syscall_event_drops:

-- The messages are emitted when the percentage of dropped system calls

with respect the number of events in the last second

is greater than the given threshold (a double in the range [0, 1]).

threshold: .1

-- Actions to be taken when system calls were dropped from the circular buffer.

actions:

  • log
  • alert

    -- Rate at which log/alert messages are emitted.

    rate: .03333

    -- Max burst of messages emitted.

    max_burst: 1

    -- Flag to enable drops for debug purposes.

    simulate_drops: false

    -- Buffer size .

    syscall_buf_size_preset: 10

    -- Custom syscalls.

    base_syscalls: custom_set: [clone, clone3, fork, vfork, execve, execveat, close] repair: false

    -- Number of cpus for buffer.

    modern_bpf: cpus_for_each_syscall_buffer: 2

We expected the syscall event drop to trigger faster (not to take 2H) or not to happen at all. You can see in the image below that there was high drop rate from the Falco logs and after restarting the pods at 17:00 it took another 1.5 Hours until the drop started again at around 18:35

Screenshot 2023-06-26 at 13 11 47

Environment

0.35.0

Mon Jun 26 09:59:35 2023: Falco version: 0.35.0 (x86_64) Mon Jun 26 09:59:35 2023: Falco initialized with configuration file: /etc/falco/falco.yaml Mon Jun 26 09:59:35 2023: Loading plugin 'k8saudit' from file /usr/share/falco/plugins/libk8saudit.so Mon Jun 26 09:59:35 2023: Loading plugin 'json' from file /usr/share/falco/plugins/libjson.so Mon Jun 26 09:59:35 2023: Loading rules from file /etc/falco/falco_rules.yaml Mon Jun 26 09:59:35 2023: Loading rules from file /etc/falco/k8s_audit_rules.yaml { "machine": "x86_64", "nodename": "falco-v9bh2", "release": "5.10.167-200.el7.x86_64", "sysname": "Linux", "version": "#1 SMP Sun Feb 12 13:08:57 UTC 2023" }

On prem deployment - 40 cores server with 190GB memory

NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

5.10.149-200.el7.x86_64 #1 SMP Sun Oct 23 08:59:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes

Andreagit97 commented 1 year ago

That's interesting thank you for reporting!

Side question: Looking at your config I saw this

# -- Buffer size .
syscall_buf_size_preset: 10
# -- Custom syscalls.
base_syscalls:
custom_set: [clone, clone3, fork, vfork, execve, execveat, close]
repair: false

Are you using the -k option? it seems quite strange to see this huge number of drops with just 6 syscalls enabled and huge buffers like in your case :thinking:

shalevpenker97 commented 1 year ago

Hi Yes im using the -k option

- /usr/bin/falco
  • --modern-bpf
  • --cri
  • /run/containerd/containerd.sock
  • -K
  • /var/run/secrets/kubernetes.io/serviceaccount/token
  • -k
  • https://$(KUBERNETES_SERVICE_HOST)
  • --k8s-node
  • $(FALCO_K8S_NODE_NAME)
  • -pk
Andreagit97 commented 1 year ago

Oh ok, that's not the initial scope of the issue, but if you want to drastically reduce drops I suggest you disable it. We are working on fixing the k8s client, the actual one doesn't work so well, sorry

shalevpenker97 commented 1 year ago

I have disabled it and the drops did not reduce.

Andreagit97 commented 1 year ago

ei @shalevpenker97 do you mind trying to collect some metrics with the metric config? https://github.com/falcosecurity/falco/blob/63ba15962bdda191a2049293da3c185dd441039a/falco.yaml#L742 In this way, we could try to understand from which syscalls drops come and why...thank you

leogr commented 1 year ago

cross-linking https://github.com/falcosecurity/falco/issues/1403

Andreagit97 commented 10 months ago

any update https://github.com/falcosecurity/falco/issues/2657#issuecomment-1700948683 ?

Andreagit97 commented 8 months ago

I will close this since without further information is a duplicate of https://github.com/falcosecurity/falco/issues/1403. Please feel free to re-open if you have further details