falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.23k stars 893 forks source link

Facing error in Falco helm installation on AWS EKS #2169

Closed ap-mx-git closed 1 year ago

ap-mx-git commented 2 years ago

Facing same issue for EKS 1.22 , Faclo 0.32.2 Attached logs for reference

* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.32.2, driver version=2.0.0+driver
* Running falco-driver-loader with: driver=module, compile=yes, download=yes

================ Cleaning phase ================

* 1. Check if kernel module 'falco' is still loaded:
- OK! There is no 'falco' module loaded.

* 2. Check all versions of kernel module 'falco' in dkms:
- There are some versions of 'falco' module in dkms.

* 3. Removing all the following versions from dkms:
2.0.0+driver

- Removing 2.0.0+driver...

------------------------------
Deleting module version: 2.0.0+driver
completely from the DKMS tree.
------------------------------
Done.

- OK! Removing '2.0.0+driver' succeeded.

[SUCCESS] Cleaning phase correctly terminated.

================ Cleaning phase ================

* Looking for a falco module locally (kernel 5.4.204-113.362.amzn2.aarch64)
* Trying to download a prebuilt falco module from https://download.falco.org/driver/2.0.0%2Bdriver/aarch64/falco_amazonlinux2_5.4.204-113.362.amzn2.aarch64_1.ko
curl: (22) The requested URL returned error: 404 
Unable to find a prebuilt falco module

Please provide steps to resolve this.

Originally posted by @ap-mx-git in https://github.com/falcosecurity/falco/issues/1803#issuecomment-1219522445

ap-mx-git commented 2 years ago

/kind bug

Andreagit97 commented 2 years ago

Hi @ap-mx-git! The problem here is that we have no pre-built driver for this kernel version as you can see here https://download.falco.org/?prefix=driver/2.0.0%2Bdriver/aarch64/ :( The good news is that we have a kernel crawler that searches for new kernels and periodically enriches this list, so you have probably to wait until this version is found by the crawler.

Just a curiosity, are you able to build the drivers locally? I don't see it in the logs since they are truncated :/

igoritos22 commented 2 years ago

@Andreagit97 there is a prevision to the kernel crawler update this new versions? We too facing the same issue with kernel module kernel 5.4.204-113.362.amzn2.aarch64

Andreagit97 commented 2 years ago

@igoritos22 We are working on that, we will come back very soon with some news :)

cc @FedeDP

sawanverma commented 1 year ago

We are facing the same issue with the kernel 5.4.209-116.367.amzn2.x86_64. We tried the suggested command to install kernel module yum install kernel-devel.

After that falco runs but its not able to capture any event from the pod as suggested by amazon in this link

https://aws.amazon.com/blogs/containers/implementing-runtime-security-in-amazon-eks-using-cncf-falco/

which are in a demo nginx pod I ran the following

touch /etc/2 cat /etc/shadow > /dev/null 2>&1

EKS 1.22 falco 0.32.2.

Kindly suggest the remedies as this is urgent for us.

keshavbaweja-git commented 1 year ago

Faced same issue as @sawanverma on Falco 0.32.2, EKS 1.21, Amazon Linux 2, Kernel Version 5.4.214-120.368.amzn2.x86_64

Installed Kernel headers on all nodes in the cluster

rpm --import https://falco.org/repo/falcosecurity-3672BA8F.asc
curl -s -o /etc/yum.repos.d/falcosecurity.repo https://falco.org/repo/falcosecurity-rpm.repo
yum -y install kernel-devel-$(uname -r)

Falco pods transition from CrashLoopBackoff to Running after the installation of kernel headers.

However, no log event is generated by Falco pods for commands below inside nginx container

touch /etc/2
cat /etc/shadow > /dev/null 2>&1
Andreagit97 commented 1 year ago

Ei @sawanverma @keshavbaweja-git this is very strange, right now the simplest thing that comes to my mind is that this could be a buffering problem, could you try to deploy Falco with the following option?

helm install falco -f values.yaml falcosecurity/falco --set tty=true

instead of the simple

helm install falco -f values.yaml falcosecurity/falco

Or you can directly set it to true in your YAML file here :point_down: https://github.com/falcosecurity/charts/blob/c860e72ab9bd65f5711268ef6468a947c3a99d13/falco/values.yaml#L124

keshavbaweja-git commented 1 year ago

hi @Andreagit97 , uninstalled and installed falco with tty=true, also turned off json by setting json_output=false. Still no sign of event being logged.

Andreagit97 commented 1 year ago

uhm ok... Could you provide us with the Falco logs?

keshavbaweja-git commented 1 year ago

falco-czqkh.log falco-sskf9.log falco-znm97.log

k logs falco-sskf9 --all-containers > falco-sskf9.log
k logs falco-znm97 --all-containers > falco-znm97.log
k logs falco-czqkh --all-containers > falco-czqkh.log
Andreagit97 commented 1 year ago

Thank you! All seems great :thinking: in the next few days we will release Falco 0.33 let's see if we face again this issue, we will try it on a EKS cluster! Thank you for reporting it

Andreagit97 commented 1 year ago

ei @keshavbaweja-git just another thing that could help us understand what is going on... Could you deploy Falco with all the loggers enabled (with DEBUG mode) like in the following command?

helm install falco -f values.yaml falcosecurity/falco --set tty=true --set extra.args='{-o,libs_logger.enabled=true,-o,log_level=debug}'
bagweraj commented 1 year ago

This is also erroring out for EKS with version Kubernetes version v1.23.9-eks-ba74326 and Falco 0.33.0 or 0.32.2 and 0.31.1. Here is what I see on container :

-------------------------------------------------------------------------
* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.33.0, driver version=3.0.1+driver, arch=x86_64, kernel release=5.4.217-126.408.amzn2.x86_64, kernel version=1
* Running falco-driver-loader with: driver=module, compile=yes, download=yes

================ Cleaning phase ================

* 1. Check if kernel module 'falco' is still loaded:
- OK! There is no 'falco' module loaded.

* 2. Check all versions of kernel module 'falco' in dkms:
- There are some versions of 'falco' module in dkms.

* 3. Removing all the following versions from dkms:
3.0.1+driver

- Removing 3.0.1+driver...

------------------------------
Deleting module version: 3.0.1+driver
completely from the DKMS tree.
------------------------------
Done.

- OK! Removing '3.0.1+driver' succeeded.

[SUCCESS] Cleaning phase correctly terminated.

================ Cleaning phase ================

* Looking for a falco module locally (kernel 5.4.217-126.408.amzn2.x86_64)
* Filename 'falco_amazonlinux2_5.4.217-126.408.amzn2.x86_64_1.ko' is composed of:
 - driver name: falco
 - target identifier: amazonlinux2
 - kernel release: 5.4.217-126.408.amzn2.x86_64
 - kernel version: 1
* Trying to download a prebuilt falco module from https://download.falco.org/driver/3.0.1%2Bdriver/x86_64/falco_amazonlinux2_5.4.217-126.408.amzn2.x86_64_1.ko
curl: (22) The requested URL returned error: 404 
Unable to find a prebuilt falco module
* Trying to dkms install falco module with GCC /usr/bin/gcc
DIRECTIVE: MAKE="'/tmp/falco-dkms-make'"

Creating symlink /var/lib/dkms/falco/3.0.1+driver/source ->
                 /usr/src/falco-3.0.1+driver

DKMS: add completed.
* Running dkms build failed, couldn't find /var/lib/dkms/falco/3.0.1+driver/build/make.log (with GCC /usr/bin/gcc)
* Trying to dkms install falco module with GCC /usr/bin/gcc-8
DIRECTIVE: MAKE="'/tmp/falco-dkms-make'"
* Running dkms build failed, couldn't find /var/lib/dkms/falco/3.0.1+driver/build/make.log (with GCC /usr/bin/gcc-8)
* Trying to dkms install falco module with GCC /usr/bin/gcc-6
DIRECTIVE: MAKE="'/tmp/falco-dkms-make'"
* Running dkms build failed, couldn't find /var/lib/dkms/falco/3.0.1+driver/build/make.log (with GCC /usr/bin/gcc-6)
* Trying to dkms install falco module with GCC /usr/bin/gcc-5
DIRECTIVE: MAKE="'/tmp/falco-dkms-make'"
* Running dkms build failed, couldn't find /var/lib/dkms/falco/3.0.1+driver/build/make.log (with GCC /usr/bin/gcc-5)
* Trying to load a system falco module, if present
Consider compiling your own falco driver and loading it or getting in touch with the Falco community
Error from server (BadRequest): container "falco" in pod "falco-lcvg4" is waiting to start: PodInitializing
bonzo71 commented 1 year ago

Hi team, is there any update on this? Just installed a fresh chart 2.3.0 - published 14th Nov (with Falco 0.33) and facing the same issue as above...

Thanks

adelca commented 1 year ago

Same issue, it broke yesterday when I upgraded my kube cluster to 1.22 and nodes were re-created to the new kernel: 5.4.219-126.411.amzn2.x86_64 This used to work with my previous nodes kernel 5.4.209-116.367.amzn2.x86_64

I know we need to wait for the kernel to be listed here for falco-driver-loader initConatiner to work: https://download.falco.org/?prefix=driver/2.0.0%2Bdriver/x86_64/

I also know I could try to recreate my cluster and force my nodes to use a specific AMI (or even installing some headers manually on the nodes) but I really dont want to go down that path.

Is there a way to tell falco (via helm install charts) to use a specific driver maybe? would that work with close enough kernel versions like the ones I have listed above?

alacuku commented 1 year ago

Faced same issue as @sawanverma on Falco 0.32.2, EKS 1.21, Amazon Linux 2, Kernel Version 5.4.214-120.368.amzn2.x86_64

Installed Kernel headers on all nodes in the cluster

rpm --import https://falco.org/repo/falcosecurity-3672BA8F.asc
curl -s -o /etc/yum.repos.d/falcosecurity.repo https://falco.org/repo/falcosecurity-rpm.repo
yum -y install kernel-devel-$(uname -r)

Falco pods transition from CrashLoopBackoff to Running after the installation of kernel headers.

However, no log event is generated by Falco pods for commands below inside nginx container

touch /etc/2
cat /etc/shadow > /dev/null 2>&1

Hi @keshavbaweja-git, I tried to reproduce you issue in EKS running Falco 0.32.2

❯ k get nodes -o wide
NAME                                            STATUS   ROLES    AGE   VERSION                INTERNAL-IP      EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
ip-192-168-53-0.eu-south-1.compute.internal     Ready    <none>   59m   v1.21.14-eks-fb459a0   192.168.53.0     15.160.75.6     Amazon Linux 2   5.4.219-126.411.amzn2.x86_64   docker://20.10.17
ip-192-168-88-163.eu-south-1.compute.internal   Ready    <none>   59m   v1.21.14-eks-fb459a0   192.168.88.163   15.161.97.129   Amazon Linux 2   5.4.219-126.411.amzn2.x86_64   docker://20.10.17

And it works fine. The events are logged as they should.

Installed falco running:

helm install falco falcosecurity/falco --set tty=true --version 2.0.18

Please make sure to check the logs of the Falco instance that is running on the same node as the container where you are running your commands.

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 commented 1 year ago

Any news here?

alacuku commented 1 year ago

The original issue was related to missing pre-built kernel modules. It isn't relevant anymore I think.

poiana commented 1 year ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 1 year ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/2169#issuecomment-1555977786): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.