falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.25k stars 893 forks source link

Following the documentation for minikube deployment doesn't work #1941

Closed ChaosInTheCRD closed 1 year ago

ChaosInTheCRD commented 2 years ago

Describe the bug

Following the documentation for creating a "Falco Learning Environment" unfortunately does not work. Upon using helm to deploy the Falco and getting this error in the pod logs:

Tue Mar 15 15:25:42 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.

Workaround I went on to try a handful of other virtual machine drivers for minikube to no avail. I resorted to the Kubernetes slack where I got help from @terylt.

As it turns out, it seems that there is a script that runs at startup to try and install the kernel module / ebpf probe necessary to get Falco running in the relevant environment. The bash script seems to do some guesswork to determine what operating sytem it is in, then tries to decide the correct approach to install the kernel module / ebpf probe. This does not currently work correctly for minikube.

to get the script to pass through the logic linked here (and hence correctly determine that it is in a minikube vm), the daemonset must be modified like so, after a helm template or a kubectl edit after deployment:

      containers:
        - name: falco
...
          volumeMounts:
...
            - mountPath: /host/etc/VERSION
              name: etc-fs
              readOnly: true
...
      volumes:
...
        - name: etc-fs
          hostPath:
            path: /etc/VERSION

...

The daemonset also needs to have eBPF enabled, as otherwise it continues to fail. This can either be done by setting the env var on the falco pod in the manifest:

          env:
          - name: FALCO_BPF_PROBE

or by enabling eBPF in the helm values file:

ebpf:
  # Enable eBPF support for Falco
  enabled: true

once these two steps have been taken, the pod should turn to a READY state:

NAME                READY   STATUS    RESTARTS   AGE
falco-falco-blfrb   1/1     Running   0          24m

How to reproduce it Follow the documentation for creating a learning environment with minikube

Expected behaviour

I feel others thoughts might be mixed on this... but to me it doesn't seem unreasonable to request the user to specify the environment they are deploying to (e.g. GKE, minikube, kind etc.) in the form of an env var or a command-line argument. This way, there is no need for a script to exist or be maintained when inevitably, situations arise that break the mechanisms in which the script tries to decipher the environment it is in.

Environment

terylt commented 2 years ago

Thanks for filing the issue Tom!

Just to add to Tom's issue. I think the bug is in the falco-driver-loader script in the if statement here: https://github.com/falcosecurity/falco/blob/a5d3663c75f4a1553edb974da07e2b44af7eb3e4/scripts/falco-driver-loader#L104

Falco uses the /etc/VERSION file to detect it is running in minikube. Unfortunately, it never gets to that check in the if statement above because I think minikube instances also have the os-release file as well, so it pops out of that if statement without detecting it's running in minikube. The old sysdig-probe-loader script dealt with this by breaking out the /etc/VERSION check into a separate if statement, which did work, but I'm not sure of the other implications of that.

Also, I think minikube requires ebpf to be enabled, but it doesn't look like that is documented in the docs.

leogr commented 2 years ago

Minikube ships its own Falco pre-built driver (see https://github.com/kubernetes/minikube/pull/6560) since it is impossible to build the driver on the fly for Minikube because it doesn't provide a compiler or kernel-headers.

Unfortunately, the last update in Minikube was two years ago :point_down: https://github.com/kubernetes/minikube/tree/master/deploy/iso/minikube-iso/package/falco-module

The driver version they ship should work up to Falco 0.30.x, but not with 0.31.1. We should open a PR in minikube to fix it.

juju4 commented 2 years ago

This does not affect only minikube. I have it on Ubuntu 18.04

# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
# uname -a
Linux HOST 4.15.0-167-generic #175-Ubuntu SMP Wed Jan 5 01:56:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# journalctl -xe -u falco
Mar 16 09:54:19 HOST systemd[1]: Started Falco: Container Native Runtime Security.
-- Subject: Unit falco.service has finished start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit falco.service has finished starting up.
--
-- The start-up result is RESULT.
Mar 16 09:54:19 HOST falco[37102]: Falco version 0.31.1 (driver version b7eb0dd65226a8dc254d228c8d950d07bf3521d2)
Mar 16 09:54:19 HOST falco[37102]: Wed Mar 16 09:54:19 2022: Falco version 0.31.1 (driver version b7eb0dd65226a8dc254d228c8d950d07bf3521d2)
Mar 16 09:54:19 HOST falco[37102]: Wed Mar 16 09:54:19 2022: Falco initialized with configuration file /etc/falco/falco.yaml
Mar 16 09:54:19 HOST falco[37102]: Wed Mar 16 09:54:19 2022: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 09:54:19 HOST falco[37102]: Falco initialized with configuration file /etc/falco/falco.yaml
Mar 16 09:54:19 HOST falco[37102]: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 09:54:20 HOST falco[37102]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 09:54:20 HOST falco[37102]: Wed Mar 16 09:54:20 2022: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 09:54:20 HOST falco[37102]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 09:54:20 HOST falco[37102]: Wed Mar 16 09:54:20 2022: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 09:54:21 HOST falco[37102]: Rules match ignored syscall: warning (ignored-evttype):
Mar 16 09:54:21 HOST falco[37102]:          loaded rules match the following events: access,brk,close,cpu_hotplug,drop,epoll_wait,eventfd,fcntl,fstat,fstat64,futex,getcwd,getdents,getd
Mar 16 09:54:21 HOST falco[37102]:          but these events are not returned unless running falco with -A
Mar 16 09:54:21 HOST falco[37102]: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.
Mar 16 09:54:21 HOST falco[37102]: Wed Mar 16 09:54:21 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.
Mar 16 09:54:21 HOST systemd[1]: falco.service: Main process exited, code=exited, status=1/FAILURE
Mar 16 09:54:21 HOST systemd[1]: falco.service: Failed with result 'exit-code'.
Mar 16 09:54:21 HOST systemd[1]: falco.service: Received 0B IP traffic, sent 0B IP traffic
# grep -C2 falco /var/log/apt/history.log

Start-Date: 2022-03-12  13:01:57
Upgrade: dotnet-runtime-3.1:amd64 (3.1.22-1, 3.1.23-1), dotnet-host:amd64 (6.0.2-1, 6.0.3-1), dotnet-hostfxr-3.1:amd64 (3.1.22-1, 3.1.23-1), sosreport:amd64 (4.1-1ubuntu0.18.04.3, 4.3-1ubuntu0.18.04.1), dotnet-runtime-deps-3.1:amd64 (3.1.22-1, 3.1.23-1), falco:amd64 (0.31.0, 0.31.1)
End-Date: 2022-03-12  13:02:40

Downgrading to 0.31.0 restores functionality

# apt-get install falco=0.31.0
# systemctl status falco
● falco.service - Falco: Container Native Runtime Security
   Loaded: loaded (/lib/systemd/system/falco.service; disabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system.control/falco.service.d
           └─50-CPUQuota.conf, 50-CPUShares.conf, 50-MemoryLimit.conf
   Active: active (running) since Wed 2022-03-16 10:34:02 UTC; 4s ago
 Main PID: 57182 (falco)
       IP: 0B in, 0B out
    Tasks: 3 (limit: 4915)
   CGroup: /system.slice/falco.service
           └─57182 /usr/bin/falco --pidfile=/var/run/falco.pid

Mar 16 10:34:02 HOST falco[57182]: Wed Mar 16 10:34:02 2022: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 10:34:02 HOST falco[57182]: Falco initialized with configuration file /etc/falco/falco.yaml
Mar 16 10:34:02 HOST falco[57182]: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 10:34:03 HOST falco[57182]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 10:34:03 HOST falco[57182]: Wed Mar 16 10:34:03 2022: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 10:34:03 HOST falco[57182]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 10:34:03 HOST falco[57182]: Wed Mar 16 10:34:03 2022: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 10:34:04 HOST falco[57182]: Rules match ignored syscall: warning (ignored-evttype):
Mar 16 10:34:04 HOST falco[57182]:          loaded rules match the following events: access,brk,close,cpu_hotplug,drop,epoll_wait,eventfd,fcntl,fstat,fstat64,futex,getcwd,getdents,getd
Mar 16 10:34:04 HOST falco[57182]:          but these events are not returned unless running falco with -A

# apt-mark hold falco
falco set on hold.
ChaosInTheCRD commented 2 years ago

@leogr would it be preferable to just align all minikube installs to the ebpf probe and have the init script download that, as I have done in my workaround? I am unsure as to what the "most supported" method of setting up falco is wrt ebpf/kernel driver though.

leogr commented 2 years ago

@ChaosInTheCRD ebpf and kmod have function parity, they are also almost equivalent in performance. Specifically for Minikube, upgrading the driver version directly in their repo would be preferable, so everything will work seamlessly (driver incompatibility does not happen on each Falco release, it's a rare occurrence). If I find some spare time, I'll try to investigate more and eventually open a PR in Minikube to fix that.

However, your solution is a valid alternative. Indeed, after looking at again I realized that /host/etc/VERSION must be mounted anyway since it is consumed by falco-driver-loader. For this reason, I believe we have to fix the helm chart. Would you like to open a PR in https://github.com/falcosecurity/charts ?

Btw, I hope we will fix both cases. Having Minikube works both with the kmod and ebpf is for sure the best option.

leogr commented 2 years ago

@juju4

This does not affect only minikube. I have it on Ubuntu 18.04

Your issue is slightly different. You should be able to install the new driver version by running:

falco-driver-loader --clean

Then

falco-driver-loader

Let me know if that works. Perhaps we will have to improve the documentation regarding that.

juju4 commented 2 years ago

no, it didn't fix the issue but made me understand the problem.

# apt-mark unhold falco
Canceled hold on falco.
# apt-get upgrade falco
# systemctl restart falco
# systemctl status falco
 falco.service - Falco: Container Native Runtime Security
   Loaded: loaded (/lib/systemd/system/falco.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system.control/falco.service.d
           └─50-CPUQuota.conf, 50-CPUShares.conf, 50-MemoryLimit.conf
   Active: activating (auto-restart) (Result: exit-code) since Wed 2022-03-16 22:31:08 UTC; 1s ago
  Process: 27908 ExecStart=/usr/bin/falco --pidfile=/var/run/falco.pid (code=exited, status=1/FAILURE)
 Main PID: 27908 (code=exited, status=1/FAILURE)
       IP: 0B in, 0B out

Mar 16 22:31:08 vps58732 systemd[1]: falco.service: Received 0B IP traffic, sent 0B IP traffic
root@vps58732:~# falco-driver-loader --clean
* Running falco-driver-loader for: falco version=0.31.1, driver version=b7eb0dd65226a8dc254d228c8d950d07bf3521d2
* Running falco-driver-loader with: driver=module, clean=yes
* Unloading falco module failed
* Removing falco failed
root@vps58732:~# falco-driver-loader 
* Running falco-driver-loader for: falco version=0.31.1, driver version=b7eb0dd65226a8dc254d228c8d950d07bf3521d2
* Running falco-driver-loader with: driver=module, compile=yes, download=yes
* Unloading falco module, if present 
* falco module still loaded, waited 5s (max wait 60s)
* falco module still loaded, waited 10s (max wait 60s)
^C

so kernel modules unloading/loading issue. And found the issue because for this system, boot ends with disabling kernel modules. rc.local has after doing multiple modprobe, including falco

echo 1 > /proc/sys/kernel/modules_disabled

I did a reboot to confirm and new version is working after. strangely, not seen this issue after past upgrades and this setting is not new. sorry for the mix. yes, different issue.

leogr commented 2 years ago

Oh yeah, there are cases where falco-driver-loader can't unload the driver. Consequently, it cannot install the new driver version. We should improve our documentation about that.

This issue happened with Falco 0.31.1 because we introduced a mechanism to detect incompatible driver versions, and this specific Falco version requires a new driver version. Combining these two factors produced the issue, which does not usually happen.

eelkonio commented 2 years ago

We also experience this issue: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.

We build the driver through driverkit-builder, load the module successfully and start Falco. When using falco 0.31.1 it croaks with the above message. When using Falco 0.31.0 it works fine.

This happens on nodes where it replaces the old Falco instances (0.29.1) but also on clean nodes that did not have any modules loaded before this new version (0.31.1) started. Unloading modules cannot be the cause there. Version 0.31.0 does not show this problem.

leogr commented 2 years ago

We build the driver through driverkit-builder, load the module successfully and start Falco. When using falco 0.31.1 it croaks with the above message. When using Falco 0.31.0 it works fine.

Note that 0.31.1 has a newer driver (ie. kernel module) version than 0.31.0.

This happens on nodes where it replaces the old Falco instances (0.29.1) but also on clean nodes that did not have any modules loaded before this new version (0.31.1) started. Unloading modules cannot be the cause there. Version 0.31.0 does not show this problem.

AFAIK, the problem arises when an old driver is already installed and loaded. Basically, when one previously installed an older version and then installs the 0.31.1, the old driver remains up and running and Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting. is returned.

The workaround is to uninstall the old driver manually before installing the 0.31.1.

PS

Falco 0.32.0 (not yet released) will come with a fix that forces the driver uninstallation when upgrading to a newer version.

leogr commented 2 years ago

/milestone 0.32.0

jasondellaluce commented 2 years ago

/remove-milestone 0.32.0 /milestone 0.33.0

leogr commented 2 years ago

/remove-milestone 0.32.0 /milestone 0.33.0

Hey @jasondellaluce

This issue should be actually fixed by 0.32.0

I know @alacuku had the same issue and is testing. Has 0.32.0 worked for you on minikube? Please let us know :)

alacuku commented 2 years ago

Hi @leogr, here are my findings on Falco and Minikube.

The issue is still present even with Falco 0.32.0. That's because the latest version of minikube v1.25.2 ships with the 85c88952b018fdbce2464222c3303229f5bfcfad version of Falco kernel module. It works fine with Falco 0.31.0 but not with later versions of Falco. Minikube developers have already bumped version of Falco to 0.31.1 (https://github.com/kubernetes/minikube/commit/69fb8c243256d407402d754bfa562a38aa794129) but we need to wait for the next release of Minikube for that.

There are two options in order to use the latest version of Falco with Minikube:

  1. We start to offer prebuilt driver modules for the Minikube kernels;
  2. The end users build their own Minikube iso image with the latest Falco driver module.
jrabbit commented 2 years ago

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

leogr commented 2 years ago

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

Hey @jrabbit Could you share more detail pls?

leogr commented 2 years ago

Meanwhile, I've opened a PR in minikube to bump Falco to 0.32.0 :point_right: https://github.com/kubernetes/minikube/pull/14329

jrabbit commented 2 years ago

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

Hey @jrabbit Could you share more detail pls?

So we're on a full 1.19 K8s cluster so it might be a different bug but we're having trouble getting the script or manually to unload the falco kernel mod. Strangely it seemed like falco kept spawning binaries (which then access the module, blocking rmmod) when the pods were unscheduled? (Maybe this is a k8s quirk i'm not used to?). Let me validate if the nodes that succeeded were ones that were freshly spun up (and thus wouldn't have falco kernel mods to remove) and get back to you w/ that.

e: So the nodes that do have working pods aren't new, the others are in restart loops with errors in dkms and Mon Jun 13 15:54:57 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting. at the end. May help to know the nodes are kops provisioned and have similar os package state. Also installed via helm w/ latest which may complicate things?

leogr commented 2 years ago

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

Hey @jrabbit Could you share more detail pls?

So we're on a full 1.19 K8s cluster so it might be a different bug but we're having trouble getting the script or manually to unload the falco kernel mod. Strangely it seemed like falco kept spawning binaries (which then access the module, blocking rmmod) when the pods were unscheduled? (Maybe this is a k8s quirk i'm not used to?). Let me validate if the nodes that succeeded were ones that were freshly spun up (and thus wouldn't have falco kernel mods to remove) and get back to you w/ that.

Not sure what is going on. The module is installed on the host, so it is still present after pods get unscheduled. The bug was that 0.31.1 was not able to upgrade the module. The 0.32.0 fixed the issue.

e: So the nodes that do have working pods aren't new, the others are in restart loops with errors in dkms and Mon Jun 13 15:54:57 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting. at the end. May help to know the nodes are kops provisioned and have similar os package state. Also installed via helm w/ latest which may complicate things?

For pods in the restart loop, I guess for some reason the driver is not found on our DBG and the falco-driver-loader script can't build it on the fly. Could you provide some logs?

Anyway, I think yours is a different problem. It would be better to open a dedicated issue,.

chukmunnlee commented 2 years ago

Got falco working according to @ChaosInTheCRD workaround but had to also delete SKIP_DRIVER_LOADERenvironment variable in falco container.

leogr commented 2 years ago

cc @alacuku Should this be fixed now? Should we have to update the documentation somewhere?

alacuku commented 2 years ago

cc @alacuku Should this be fixed now? Should we have to update the documentation somewhere?

Yes, it should be fixed now. I'm waiting for0.33.0 falco release in order to update the docs and eventually close the issue.

leogr commented 1 year ago

cc @alacuku Should this be fixed now? Should we have to update the documentation somewhere?

Yes, it should be fixed now. I'm waiting for0.33.0 falco release in order to update the docs and eventually close the issue.

@alacuku Thank you! :pray:

Please put Fixes https://github.com/falcosecurity/falco/issues/1941 in the falco-website's PR you will open, so we both track and automatically close this once you have done with the docs.