falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.37k stars 901 forks source link

rule match causes crash "Error: rule id or priority out of bounds" in stats_manager.cpp #3059

Closed pitrh closed 8 months ago

pitrh commented 8 months ago

Running Falco version: 0.37.0 (x86_64), hitting a rule causes the falco pod to crash with "Error: rule id or priority out of bounds"

Set up observation -

$ kubectl -n falco logs falco-68ddd -f

Initial output is

Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-driver-loader (init), falcoctl-artifact-install (init)
Wed Feb  7 11:59:02 2024: Falco version: 0.37.0 (x86_64)
Wed Feb  7 11:59:02 2024: Falco initialized with configuration file: /etc/falco/falco.yaml
Wed Feb  7 11:59:02 2024: System info: Linux version 5.4.0-170-generic (buildd@lcy02-amd64-059) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) #188-Ubuntu SMP Wed Jan 10 09:51:01 UTC 2024
Wed Feb  7 11:59:02 2024: Loading rules from file /etc/falco/falco_rules.yaml
Wed Feb  7 11:59:02 2024: Loading rules from file /etc/falco/rules.d/exceptions.yaml
Wed Feb  7 11:59:02 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Wed Feb  7 11:59:02 2024: Starting health webserver with threadiness 6, listening on 0.0.0.0:8765
Wed Feb  7 11:59:02 2024: Loaded event sources: syscall
Wed Feb  7 11:59:02 2024: Enabled event sources: syscall
Wed Feb  7 11:59:02 2024: Opening 'syscall' source with BPF probe. BPF probe path: /root/.falco/falco-bpf.o

Then shell into a pod running on the same node,

[Thu Feb 08 09:53:18] peter@peters-mbp:~/Downloads$ kubectl -n keycloak-alpha exec -it devops-keycloak-postgres-0 -- /bin/bash
[... banners deleted for brevity ...]

Copy a binary to somewhere it should not be, execute (thereby hitting a rule):

root@devops-keycloak-postgres-0:/home/postgres# cd
root@devops-keycloak-postgres-0:~# cp /bin/cat .
root@devops-keycloak-postgres-0:~# ./cat
.bash_history  .bashrc        .config/       .profile       cat
root@devops-keycloak-postgres-0:~# ./cat .bashrc
# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
[ ... rest of output elided ...]

This produces the following log output:

{"hostname":"falco-68ddd","output":"07:05:02.351487517: Warning Sensitive file opened for reading by non-trusted program (file=/etc/shadow gparent=patroni ggparent=runsv gggparent=runsvdir evt_type=openat user=postgres user_uid=101 user_loginuid=-1 process=vacuumdb proc_exepath=/usr/bin/perl parent=post_init.sh command=vacuumdb /usr/bin/vacuumdb -aZ terminal=0 container_id=c74f6d1d11ad container_image=ghcr.io/zalando/spilo-15 container_image_tag=3.0-p1 container_name=k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0 k8s_ns=keycloak-alpha k8s_pod_name=devops-keycloak-postgres-0)","priority":"Warning","rule":"Read sensitive file untrusted","source":"syscall","tags":["T1555","container","filesystem","host","maturity_stable","mitre_credential_access"],"time":"2024-02-08T07:05:02.351487517Z", "output_fields": {"container.id":"c74f6d1d11ad","container.image.repository":"ghcr.io/zalando/spilo-15","container.image.tag":"3.0-p1","container.name":"k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0","evt.time":1707375902351487517,"evt.type":"openat","fd.name":"/etc/shadow","k8s.ns.name":"keycloak-alpha","k8s.pod.name":"devops-keycloak-postgres-0","proc.aname[2]":"patroni","proc.aname[3]":"runsv","proc.aname[4]":"runsvdir","proc.cmdline":"vacuumdb /usr/bin/vacuumdb -aZ","proc.exepath":"/usr/bin/perl","proc.name":"vacuumdb","proc.pname":"post_init.sh","proc.tty":0,"user.loginuid":-1,"user.name":"postgres","user.uid":101}}
Events detected: 1
Rule counts by severity:
   WARNING: 1
Triggered rules by rule name:
   Read sensitive file untrusted: 1
Error: rule id or priority out of bounds

And the pod has restarted:

[Thu Feb 08 09:54:10] peter@peters-mbp:~$ kubectl -n falco get pods -o wide
NAME                                  READY   STATUS    RESTARTS      AGE   IP              NODE          NOMINATED NODE   READINESS GATES
falco-5spm7                           2/2     Running   0             20h   10.233.65.31    kubemanet41   <none>           <none>
falco-68ddd                           2/2     Running   1 (32s ago)   20h   10.233.67.244   kubemanet44   <none>           <none>
falco-b9zld                           2/2     Running   0             20h   10.233.68.147   kubemanet45   <none>           <none>
falco-falcosidekick-bbd4bdf6c-9mmr7   1/1     Running   0             24h   10.233.67.110   kubemanet44   <none>           <none>
falco-falcosidekick-bbd4bdf6c-g4xq2   1/1     Running   0             22h   10.233.66.204   kubemanet42   <none>           <none>
falco-fz6sf                           2/2     Running   2 (12m ago)   20h   10.233.69.40    kubemanet43   <none>           <none>
falco-m6726                           2/2     Running   0             20h   10.233.66.68    kubemanet42   <none>           <none>
falco-wr4ff                           2/2     Running   0             20h   10.233.64.49    kubemanet40   <none>           <none>

Expected behaviour

Falco should report the rule match, but keep running.

Screenshots

[Thu Feb 08 09:52:25] peter@peters-mbp:~$ kubectl -n falco logs falco-68ddd -f
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-driver-loader (init), falcoctl-artifact-install (init)
Wed Feb  7 11:59:02 2024: Falco version: 0.37.0 (x86_64)
Wed Feb  7 11:59:02 2024: Falco initialized with configuration file: /etc/falco/falco.yaml
Wed Feb  7 11:59:02 2024: System info: Linux version 5.4.0-170-generic (buildd@lcy02-amd64-059) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) #188-Ubuntu SMP Wed Jan 10 09:51:01 UTC 2024
Wed Feb  7 11:59:02 2024: Loading rules from file /etc/falco/falco_rules.yaml
Wed Feb  7 11:59:02 2024: Loading rules from file /etc/falco/rules.d/exceptions.yaml
Wed Feb  7 11:59:02 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Wed Feb  7 11:59:02 2024: Starting health webserver with threadiness 6, listening on 0.0.0.0:8765
Wed Feb  7 11:59:02 2024: Loaded event sources: syscall
Wed Feb  7 11:59:02 2024: Enabled event sources: syscall
Wed Feb  7 11:59:02 2024: Opening 'syscall' source with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
{"hostname":"falco-68ddd","output":"07:05:02.351487517: Warning Sensitive file opened for reading by non-trusted program (file=/etc/shadow gparent=patroni ggparent=runsv gggparent=runsvdir evt_type=openat user=postgres user_uid=101 user_loginuid=-1 process=vacuumdb proc_exepath=/usr/bin/perl parent=post_init.sh command=vacuumdb /usr/bin/vacuumdb -aZ terminal=0 container_id=c74f6d1d11ad container_image=ghcr.io/zalando/spilo-15 container_image_tag=3.0-p1 container_name=k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0 k8s_ns=keycloak-alpha k8s_pod_name=devops-keycloak-postgres-0)","priority":"Warning","rule":"Read sensitive file untrusted","source":"syscall","tags":["T1555","container","filesystem","host","maturity_stable","mitre_credential_access"],"time":"2024-02-08T07:05:02.351487517Z", "output_fields": {"container.id":"c74f6d1d11ad","container.image.repository":"ghcr.io/zalando/spilo-15","container.image.tag":"3.0-p1","container.name":"k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0","evt.time":1707375902351487517,"evt.type":"openat","fd.name":"/etc/shadow","k8s.ns.name":"keycloak-alpha","k8s.pod.name":"devops-keycloak-postgres-0","proc.aname[2]":"patroni","proc.aname[3]":"runsv","proc.aname[4]":"runsvdir","proc.cmdline":"vacuumdb /usr/bin/vacuumdb -aZ","proc.exepath":"/usr/bin/perl","proc.name":"vacuumdb","proc.pname":"post_init.sh","proc.tty":0,"user.loginuid":-1,"user.name":"postgres","user.uid":101}}
Events detected: 1
Rule counts by severity:
   WARNING: 1
Triggered rules by rule name:
   Read sensitive file untrusted: 1
Error: rule id or priority out of bounds

[Thu Feb 08 09:54:10] peter@peters-mbp:~$ kubectl -n falco get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES falco-5spm7 2/2 Running 0 20h 10.233.65.31 kubemanet41 falco-68ddd 2/2 Running 1 (32s ago) 20h 10.233.67.244 kubemanet44 falco-b9zld 2/2 Running 0 20h 10.233.68.147 kubemanet45 falco-falcosidekick-bbd4bdf6c-9mmr7 1/1 Running 0 24h 10.233.67.110 kubemanet44 falco-falcosidekick-bbd4bdf6c-g4xq2 1/1 Running 0 22h 10.233.66.204 kubemanet42 falco-fz6sf 2/2 Running 2 (12m ago) 20h 10.233.69.40 kubemanet43 falco-m6726 2/2 Running 0 20h 10.233.66.68 kubemanet42 falco-wr4ff 2/2 Running 0 20h 10.233.64.49 kubemanet40 [Thu Feb 08 09:54:41] peter@peters-mbp:~$

Environment

[Thu Feb 08 09:54:41] peter@peters-mbp:~$ kubectl version Client Version: v1.29.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.25.15

evr-pehan@kubemanet44:~$ uname -a Linux kubemanet44 5.4.0-170-generic #188-Ubuntu SMP Wed Jan 10 09:51:01 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes

Additional context

The messages all apparently come from stats_manager.cpp. My C++ is not strong enogh to supply a fix, but I suspect initialization to 0 of key variables with subsequent test for non-zeroness without any assignment other than initialization may be our culprit here.

Andreagit97 commented 8 months ago

hey @pitrh thank you for reporting! we will take a look ASAP!

FedeDP commented 8 months ago

Hi! Mind to share your /etc/falco/rules.d/exceptions.yaml content?

pitrh commented 8 months ago

@FedeDP follows here:

root@falco-68ddd:/# cat /etc/falco/rules.d/exceptions.yaml
- list: known_drop_and_execute_containers
  items: [ harbor.fiskeridirektoratet.no/saga/saga-pdf, harbor.fiskeridirektoratet.no/aqua-portal/aqua-portal-pdfcreator, ghcr.io/renovatebot/renovate, bitnami/redis ]
  override:
    items: append
- list: known_memfd_execution_binaries
  items: [ timeout ]
  override:
    items: append
- rule: Drop and execute new binary in container
  condition: and (not proc.aname contains playwright.sh or not proc.pname in (playwright.sh, node, chrome) or not k8s.pod.name contains renovate)
  override:
    condition: append
- list: known_shell_spawn_cmdlines
  items: [ '"sh -c /health/ping_readiness_local_and_master.sh 1"' ]
  override:
    items: append
Andreagit97 commented 8 months ago

We tried to reproduce with your repro

root@devops-keycloak-postgres-0:/home/postgres# cd
root@devops-keycloak-postgres-0:~# cp /bin/cat .
root@devops-keycloak-postgres-0:~# ./cat
.bash_history  .bashrc        .config/       .profile       cat
root@devops-keycloak-postgres-0:~# ./cat .bashrc

but the log you posted seems to be triggered by another process

{"hostname":"falco-68ddd","output":"07:05:02.351487517: Warning Sensitive file opened for reading by non-trusted program (file=/etc/shadow gparent=patroni ggparent=runsv gggparent=runsvdir evt_type=openat user=postgres user_uid=101 user_loginuid=-1 process=vacuumdb proc_exepath=/usr/bin/perl parent=post_init.sh command=vacuumdb /usr/bin/vacuumdb -aZ terminal=0 container_id=c74f6d1d11ad container_image=ghcr.io/zalando/spilo-15 container_image_tag=3.0-p1 container_name=k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0 k8s_ns=keycloak-alpha k8s_pod_name=devops-keycloak-postgres-0)","priority":"Warning","rule":"Read sensitive file untrusted","source":"syscall","tags":["T1555","container","filesystem","host","maturity_stable","mitre_credential_access"],"time":"2024-02-08T07:05:02.351487517Z", "output_fields": {"container.id":"c74f6d1d11ad","container.image.repository":"ghcr.io/zalando/spilo-15","container.image.tag":"3.0-p1","container.name":"k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0","evt.time":1707375902351487517,"evt.type":"openat","fd.name":"/etc/shadow","k8s.ns.name":"keycloak-alpha","k8s.pod.name":"devops-keycloak-postgres-0","proc.aname[2]":"patroni","proc.aname[3]":"runsv","proc.aname[4]":"runsvdir","proc.cmdline":"vacuumdb /usr/bin/vacuumdb -aZ","proc.exepath":"/usr/bin/perl","proc.name":"vacuumdb","proc.pname":"post_init.sh","proc.tty":0,"user.loginuid":-1,"user.name":"postgres","user.uid":101}}
process=vacuumdb 

At the moment we are not able to reproduce it, could you provide us with your helm installation command, just to check what you enable when you install falco

pitrh commented 8 months ago

This setup is done via terraform which runs its thing based on

( sorry about the formatting, not my forte it seems)

`[Thu Feb 08 11:54:52] peter@peters-mbp:~/fdir/tf-dockyard$ cat modules/falco/falco.tf

https://artifacthub.io/packages/helm/falcosecurity/falco/3.0.0

https://github.com/falcosecurity/charts

resource "helm_release" "falco" { name = "falco" repository = "https://falcosecurity.github.io/charts" chart = "falco"

namespace = var.falco_namespace version = var.falco_chart_version

values = [ templatefile("${path.module}/falco.yaml", { slack_webhook = var.falco_slack_webhook environment = var.environment }), ] }`

`[Thu Feb 08 11:56:51] peter@peters-mbp:~/fdir/tf-dockyard$ cat modules/falco/falco.yaml image: registry: "harbor.fiskeridirektoratet.no/dockerhub-proxy" #########################

Scenario requirements

#########################

Sensors dislocation configuration (scenario requirement)

FDIR: Speed up deployment

controller: kind: daemonset daemonset: updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 40%

driver: enabled: true kind: ebpf ebpf:

-- Constrain Falco with capabilities instead of running a privileged container.

# This option is only supported with the eBPF driver and a kernel >= 5.8.
# Ensure the eBPF driver is enabled (i.e., setting the `driver.kind` option to `ebpf`).
leastPrivileged: false

######################

falco.yaml config

######################

falco:
  json_output: true

  # When using json output, whether or not to include the "output" property
  # itself (e.g. "File below a known binary directory opened for writing
  # (user=root ....") in the json output.
  json_include_output_property: true

  # Send information logs to stderr and/or syslog Note these are *not* security
  # notification logs! These are just Falco lifecycle (and possibly error) logs.
  log_stderr: true
  log_syslog: true

  # Minimum log level to include in logs. Note: these levels are
  # separate from the priority field of rules. This refers only to the
  # log level of Falco's internal logging. Can be one of "emergency",
  # "alert", "critical", "error", "warning", "notice", "info", "debug".
  log_level: info

  # Minimum rule priority level to load and run. All rules having a
  # priority more severe than this level will be loaded/run.  Can be one
  # of "emergency", "alert", "critical", "error", "warning", "notice",
  # "info", "debug".
  priority: warning

  stdout_output:
    enabled: true

  http_output:
    enabled: true

########################
# Falco integrations   #
########################

# -- For configuration values, see https://github.com/falcosecurity/charts/blob/master/falcosidekick/values.yaml
falcosidekick:
  enabled: true

  webui:
    enabled: false
    replicaCount: 1
    ingress:
      enabled: false
    redis:
      enabled: true
      storageSize: "4Gi"

  config:
    slack:
      footer: "See tf-dockyard/modules/falco/falco.yaml for configuration."
      # Slack WebhookURL (ex: https://hooks.slack.com/services/XXXX/YYYY/ZZZZ), if not empty, Slack output is enabled
      webhookurl: ${slack_webhook}
      # all (default), text, fields
      outputformat: "fields"
      # minimum priority of event for using this output, order is emergency|alert|critical|error|warning|notice|informational|debug or "" (default)
      minimumpriority: "error"
      # a Go template to format Slack Text above Attachment, displayed in addition to the output from `SLACK_OUTPUTFORMAT`, see [Slack Message Formatting](#slack-message-formatting) in the README for details. If empty, no Text is displayed before Attachment.
      messageformat: 'Alert triggered in the ${environment} cluster:'

###########################
# Extras and customization #
############################
# See default rules at https://github.com/falcosecurity/rules/blob/main/rules/falco_rules.yaml
customRules:
  exceptions.yaml: |-
    - list: known_drop_and_execute_containers
      items: [ harbor.fiskeridirektoratet.no/saga/saga-pdf, harbor.fiskeridirektoratet.no/aqua-portal/aqua-portal-pdfcreator, ghcr.io/renovatebot/renovate, bitnami/redis ]
      override:
        items: append
    - list: known_memfd_execution_binaries
      items: [ timeout ]
      override:
        items: append
    - rule: Drop and execute new binary in container
      condition: and (not proc.aname contains playwright.sh or not proc.pname in (playwright.sh, node, chrome) or not k8s.pod.name contains renovate)
      override:
        condition: append
    - list: known_shell_spawn_cmdlines
      items: [ '"sh -c /health/ping_readiness_local_and_master.sh 1"' ]
      override:
        items: append`

[Thu Feb 08 11:58:09] peter@peters-mbp:~/fdir/tf-dockyard$ cat modules/falco/variables.tf variable "falco_namespace" {} variable "falco_chart_version" {} variable "falco_slack_webhook" { description = "Webhook for slack notifications" } variable "environment" { description = "Environment name" type = string }

pitrh commented 8 months ago

We tried to reproduce with your repro

root@devops-keycloak-postgres-0:/home/postgres# cd
root@devops-keycloak-postgres-0:~# cp /bin/cat .
root@devops-keycloak-postgres-0:~# ./cat
.bash_history  .bashrc        .config/       .profile       cat
root@devops-keycloak-postgres-0:~# ./cat .bashrc

but the log you posted seems to be triggered by another process

{"hostname":"falco-68ddd","output":"07:05:02.351487517: Warning Sensitive file opened for reading by non-trusted program (file=/etc/shadow gparent=patroni ggparent=runsv gggparent=runsvdir evt_type=openat user=postgres user_uid=101 user_loginuid=-1 process=vacuumdb proc_exepath=/usr/bin/perl parent=post_init.sh command=vacuumdb /usr/bin/vacuumdb -aZ terminal=0 container_id=c74f6d1d11ad container_image=ghcr.io/zalando/spilo-15 container_image_tag=3.0-p1 container_name=k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0 k8s_ns=keycloak-alpha k8s_pod_name=devops-keycloak-postgres-0)","priority":"Warning","rule":"Read sensitive file untrusted","source":"syscall","tags":["T1555","container","filesystem","host","maturity_stable","mitre_credential_access"],"time":"2024-02-08T07:05:02.351487517Z", "output_fields": {"container.id":"c74f6d1d11ad","container.image.repository":"ghcr.io/zalando/spilo-15","container.image.tag":"3.0-p1","container.name":"k8s_postgres_devops-keycloak-postgres-0_keycloak-alpha_f24f3be9-4fbf-430b-8171-94898964d2fd_0","evt.time":1707375902351487517,"evt.type":"openat","fd.name":"/etc/shadow","k8s.ns.name":"keycloak-alpha","k8s.pod.name":"devops-keycloak-postgres-0","proc.aname[2]":"patroni","proc.aname[3]":"runsv","proc.aname[4]":"runsvdir","proc.cmdline":"vacuumdb /usr/bin/vacuumdb -aZ","proc.exepath":"/usr/bin/perl","proc.name":"vacuumdb","proc.pname":"post_init.sh","proc.tty":0,"user.loginuid":-1,"user.name":"postgres","user.uid":101}}
process=vacuumdb 

At the moment we are not able to reproduce it, could you provide us with your helm installation command, just to check what you enable when you install falco

the vacuumdb process as the trigger here is weird, but it might indicate that there is some unusual combination of factors here that cause this to happen (possibly exclusively) in our environment.

Andreagit97 commented 8 months ago

Thank you for the quick answer!

pitrh commented 8 months ago

Thank you for the quick answer!

Oh, here's hoping the information provided is useful in resolving the issue :)

FedeDP commented 8 months ago

I opened the PR with the fix: https://github.com/falcosecurity/falco/pull/3060

FedeDP commented 8 months ago

/milestone 0.37.1

Andreagit97 commented 8 months ago

This shouldn't be closed until 0.37.1 is out

FedeDP commented 8 months ago

Hey i just released Falco 0.37.1-rc1, first RC for the 0.37.1 bug fix release. Can you try with it? Simply use the 0.37.1-rc1 image tag ;)

luringens commented 8 months ago

Thank you, me and @pitrh will try it tomorrow morning 🙂

pitrh commented 8 months ago

Hey i just released Falco 0.37.1-rc1, first RC for the 0.37.1 bug fix release. Can you try with it? Simply use the 0.37.1-rc1 image tag ;)

We just deployed with that and ran a simple test. The expected alert came, and the falco pod did not crash.

So I think we have a fix :)

FedeDP commented 8 months ago

Epic :D Thanks for reporting and quickly testing :) /close

poiana commented 8 months ago

@FedeDP: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/3059#issuecomment-1940645957): >Epic :D Thanks for reporting and quickly testing :) >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.