falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.26k stars 895 forks source link

Falco 0.35.1 segfaults when using runsc run directly, but not when using docker --runtime=runsc-falco #2897

Closed thundergolfer closed 9 months ago

thundergolfer commented 10 months ago

Describe the bug

Following https://gvisor.dev/docs/tutorials/falco/ I can successfully get Falco working and detecting issues. I first got a detection on the rule demonstrated in that guide, and then got cryptomining detection working using the default rules for that.

Unfortunately, when attempting to do exactly the same thing in our own container runtime (not Docker) where we use runsc run directly, Falco segfaults, or errors with Error: stoull or Error: std:bad_alloc.

Here is an example log where I first run a Docker container running nbminer, and then attempt to run a container using our own runtime.

(modal) ubuntu@ip-10-1-8-45:~/modal$ sudo falco -v -r /etc/falco/falco_rules.local.yaml  -c /etc/falco/falco.yaml   --gvisor-config /etc/falco/pod-init.json
Wed Nov  1 15:25:38 2023: Falco version: 0.35.0 (x86_64)
Wed Nov  1 15:25:38 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Wed Nov  1 15:25:38 2023: Loading rules from file /etc/falco/falco_rules.local.yaml
Wed Nov  1 15:25:38 2023: gRPC server threadiness equals to 8
Wed Nov  1 15:25:38 2023: Starting health webserver with threadiness 8, listening on port 8765
Wed Nov  1 15:25:38 2023: Enabled event sources: syscall
Wed Nov  1 15:25:38 2023: Opening 'syscall' source with gVisor. Configuration path: /etc/falco/pod-init.json
Wed Nov  1 15:25:38 2023: Starting gRPC server at unix:///var/run/falco.sock
15:25:40.634960695: Critical Possible miner running (command=nbminer ./nbminer -a ethash -o stratum+tcp://cn.sparkpool.com:13333 -u 0x4296116d44a4a7259B52B1A756e19083e675062A.default -log pid=<NA> container=great_mccarthy (id=8cdaef990cd2) image=ubuntu)
15:25:41.699553922: Critical Possible miner running (command=nbminer ./nbminer -a ethash -o stratum+tcp://cn.sparkpool.com:13333 -u 0x4296116d44a4a7259B52B1A756e19083e675062A.default -log -RUN -reboot-times 0 pid=<NA> container=great_mccarthy (id=8cdaef990cd2) image=ubuntu)
Events detected: 2
Rule counts by severity:
   CRITICAL: 2
Triggered rules by rule name:
   Detect crypto miners using the Stratum protocol: 2
Wed Nov  1 15:25:53 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Wed Nov  1 15:25:53 2023: Waiting for the gRPC threads to complete
Wed Nov  1 15:25:53 2023: Draining all the remaining gRPC events
Wed Nov  1 15:25:53 2023: Shutting down gRPC server complete
Error: std::bad_alloc

How to reproduce it

Our container runtime is custom and closed-source, but I can provide details on the particulars as needed 🙂. Our runsc command looks like this:

sudo runsc -platform=systrap -systemd-cgroup -file-access=shared -host-uds=open -overlay2=none -directfs -dcache=0 --cpu-num-from-quota --pod-init-config=/opt/falco/gvisor-falco-config.json -debug-log=/tmp/runsc/ -debug -strace run -bundle tempgvisor/ <containerID>

Expected behaviour

Expect that falco doesn't segfault when not using Docker and gvisor, but just runsc run directly.

Screenshots

Environment

falco --version
Wed Nov  1 20:30:20 2023: Falco version: 0.36.1 (x86_64)
Wed Nov  1 20:30:20 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Falco version: 0.36.1
Libs version:  0.13.2
Plugin API:    3.1.0
Engine:        26
Driver:
  API version:    5.0.0
  Schema version: 2.0.0
  Default driver: 6.0.1+driver
{
  "machine": "x86_64",
  "nodename": "ip-10-1-8-45",
  "release": "5.15.0-1044-aws",
  "sysname": "Linux",
  "version": "#49~20.04.1-Ubuntu SMP Mon Aug 21 17:09:32 UTC 2023"
}
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Additional context

thundergolfer commented 10 months ago

I tried upgrading my falco version to hopefully fix this issue, but ran into https://github.com/falcosecurity/falco/issues/2896

LucaGuerra commented 9 months ago

I believe this issue comes from the fact that the container ID is not a hexadecimal string, which is what normally happens with Docker, k8s etc. Can you try running a container with an hex ID to confirm?

thundergolfer commented 9 months ago

Hey @LucaGuerra, we opted to ingest the gVisor tracepoints directly, so no longer have Falco setup in our system to test.

LucaGuerra commented 9 months ago

Thanks for the detailed report anyways! After an experiment with runsc I believe the issue you had was due to non-hex container IDs. I'm going to close this issue and open a more focused one in the libs repo. Thanks again!

LucaGuerra commented 8 months ago

This will be fixed in the upcoming Falco 0.37.0

/milestone 0.37.0