falcosecurity / plugins

Falco plugins registry
Apache License 2.0
78 stars 71 forks source link

K8smeta does not populate its events even though it is configured correctly and there are no errors #514

Open fjellvannet opened 2 weeks ago

fjellvannet commented 2 weeks ago

Describe the bug I set up k8smeta and k8smetacollector according to this command (line 273 in Falco's official Helm chart):

helm install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set collectors.kubernetes.enabled=true

I create and add a custom syscall rule that often triggers in my deployment, and use a k8smeta-field, k8smeta.pod.name to be precise. I would expect this field to be populated, but it returns N/A. Sorry for this bug report being very long, I just included a lot of context :)

How to reproduce it

Deploy falco with the following command using its Helm-chart: helm upgrade --install falco falcosecurity/falco --namespace falco --create-namespace -f falco-values.yaml

falco-values.yaml has the following contents:

falco:
  rules_file:
    - /etc/falco/rules.d

driver:
  kind: ebpf

collectors:
  kubernetes:
    enabled: false

falcosidekick:
  enabled: true
  webui:
    enabled: true

customRules:
  rules-k8smeta.yaml: |-
    - macro: k8s_containers
      condition: >
        (container.image.repository in (gcr.io/google_containers/hyperkube-amd64,
        gcr.io/google_containers/kube2sky,
        docker.io/sysdig/sysdig, sysdig/sysdig,
        fluent/fluentd-kubernetes-daemonset, prom/prometheus,
        falco_containers,
        falco_no_driver_containers,
        ibm_cloud_containers,
        velero/velero,
        quay.io/jetstack/cert-manager-cainjector, weaveworks/kured,
        quay.io/prometheus-operator/prometheus-operator,
        registry.k8s.io/ingress-nginx/kube-webhook-certgen, quay.io/spotahome/redis-operator,
        registry.opensource.zalan.do/acid/postgres-operator, registry.opensource.zalan.do/acid/postgres-operator-ui,
        rabbitmqoperator/cluster-operator, quay.io/kubecost1/kubecost-cost-model,
        docker.io/bitnami/prometheus, docker.io/bitnami/kube-state-metrics, mcr.microsoft.com/oss/azure/aad-pod-identity/nmi)
        or (k8s.ns.name = "kube-system"))

    - macro: never_true
      condition: (evt.num=0)

    - macro: container
      condition: (container.id != host)

    - macro: k8s_api_server
      condition: (fd.sip.name="kubernetes.default.svc.cluster.local")

    - macro: user_known_contact_k8s_api_server_activities
      condition: (never_true)

    - rule: Custom Contact K8S API Server From Container
      desc: >
        Detect attempts to communicate with the K8S API Server from a container by non-profiled users. Kubernetes APIs play a 
        pivotal role in configuring the cluster management lifecycle. Detecting potential unauthorized access to the API server 
        is of utmost importance. Audit your complete infrastructure and pinpoint any potential machines from which the API server 
        might be accessible based on your network layout. If Falco can't operate on all these machines, consider analyzing the 
        Kubernetes audit logs (typically drained from control nodes, and Falco offers a k8saudit plugin) as an additional data 
        source for detections within the control plane.
      condition: >
        evt.type=connect and evt.dir=< 
        and (fd.typechar=4 or fd.typechar=6) 
        and container 
        and k8s_api_server 
        and not k8s_containers 
        and not user_known_contact_k8s_api_server_activities
      output: Custom Unexpected connection to K8s API Server from container (connection=%fd.name lport=%fd.lport rport=%fd.rport fd_type=%fd.type fd_proto=%fd.l4proto evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline k8s_podname=%k8smeta.pod.name orig_podname=%k8s.pod.name terminal=%proc.tty %container.info)
      priority: NOTICE
      tags: [maturity_stable, container, network, k8s, mitre_discovery, T1565]

The included custom rule is a copy of the Custom Contact K8S API Server From Container rule with all its dependencies. The only modification is that two new fields k8s_podname=%k8smeta.pod.name and orig_podname=%k8s.pod.name are added to the output. The orig_podname field is populated - it says the same as k8s.pod.name in the output. However, k8s_podname remains N/A, and I would expect this field to be populated if the same value is available in k8s.pod.name, which is said to only be kept alive for backwards compatibility purposes (line 250 in Falco's official Helm chart).

Expected behaviour I would expect that if k8s.pod.name is populated with a value, k8smeta.pod.name should also be populated.

Screenshots k8smeta-fields not available Checking out the events in the UI, we see that the k8s_podname field remains N/A while orig_podname gets the same value as k8s.pod.name.

metacollector is running falco-k8s-metacollector is running in the same namespace as the falco-pods and the UI.

k8smeta plugin is installed Logs from the artifact-install-container show that k8smeta is indeed installed correctly.

Environment

Additional context /etc/falco/falco.yaml pulled from one of the falco-pods:

base_syscalls:
  custom_set: []
  repair: false
buffered_outputs: false
config_files:
- /etc/falco/config.d
engine:
  ebpf:
    buf_size_preset: 4
    drop_failed_exit: false
    probe: ${HOME}/.falco/falco-bpf.o
  kind: ebpf
falco_libs:
  thread_table_size: 262144
file_output:
  enabled: false
  filename: ./events.txt
  keep_alive: false
grpc:
  bind_address: unix:///run/falco/falco.sock
  enabled: false
  threadiness: 0
grpc_output:
  enabled: false
http_output:
  ca_bundle: ""
  ca_cert: ""
  ca_path: /etc/falco/certs/
  client_cert: /etc/falco/certs/client/client.crt
  client_key: /etc/falco/certs/client/client.key
  compress_uploads: false
  echo: false
  enabled: true
  insecure: false
  keep_alive: false
  mtls: false
  url: http://falco-falcosidekick:2801
  user_agent: falcosecurity/falco
json_include_output_property: true
json_include_tags_property: true
json_output: true
libs_logger:
  enabled: false
  severity: debug
load_plugins:
- k8smeta
log_level: info
log_stderr: true
log_syslog: true
metrics:
  convert_memory_to_mb: true
  enabled: false
  include_empty_values: false
  interval: 1h
  kernel_event_counters_enabled: true
  libbpf_stats_enabled: true
  output_rule: true
  resource_utilization_enabled: true
  rules_counters_enabled: true
  state_counters_enabled: true
output_timeout: 2000
outputs:
  max_burst: 1000
  rate: 0
outputs_queue:
  capacity: 0
plugins:
- init_config: null
  library_path: libk8saudit.so
  name: k8saudit
  open_params: http://:9765/k8s-audit
- library_path: libcloudtrail.so
  name: cloudtrail
- init_config: ""
  library_path: libjson.so
  name: json
- init_config:
    collectorHostname: falco-k8s-metacollector.falco.svc
    collectorPort: 45000
    nodeName: ${FALCO_K8S_NODE_NAME}
  library_path: libk8smeta.so
  name: k8smeta
priority: debug
program_output:
  enabled: false
  keep_alive: false
  program: 'jq ''{text: .output}'' | curl -d @- -X POST https://hooks.slack.com/services/XXX'
rule_matching: first
rules_file:
- /etc/falco/rules.d
stdout_output:
  enabled: true
syscall_event_drops:
  actions:
  - log
  - alert
  max_burst: 1
  rate: 0.03333
  simulate_drops: false
  threshold: 0.1
syscall_event_timeouts:
  max_consecutives: 1000
syslog_output:
  enabled: true
time_format_iso_8601: false
watch_config_files: true
webserver:
  enabled: true
  k8s_healthz_endpoint: /healthz
  listen_port: 8765
  prometheus_metrics_enabled: false
  ssl_certificate: /etc/falco/falco.pem
  ssl_enabled: false
  threadiness: 0

As far as I can see, k8smeta and k8smeta collector are configured correctly here in the config as well. I experimented with changing the port or hostname of the metacollector, and then I got errors, same when I turned on SSL without fixing the certificates. This screenshot from the falco-container log also confirms that k8smeta is running - it says that it received at least one event from k8s-metacollector, indicating that their connection should be OK.

k8smeta plugin is healthy - no errors in logs Also here it looks as if the k8smeta plugin is healthy. When I removed collectors.kubernetes.enabled=true, falco would not start any longer claiming that I used an invalid value in my rule in rules-k8smeta.yaml, the invalid value being k8smeta.pod.name, which is another indication of k8smeta likely being set up correctly.

alacuku commented 1 week ago

Hi @fjellvannet, unfortunately, I'm not able to reproduce your issue. It works on my side.

I installed Falco:

helm install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set collectors.kubernetes.enabled=true

I added the custom rule as you did.

And here is the output of Falco:

Thu Jul 4 15:17:12 2024: [info] [k8smeta] The plugin received at least one event from the k8s-metacollector
15:17:14.901142503: Notice Custom Unexpected connection to K8s API Server from container (connection=10.16.1.11:50383->10.0.0.1:80 lport=50383 rport=80 fd_type=ipv4 fd_proto=udp evt_type=connect user=root user_uid=0 user_loginuid=-1 process=curl proc_exepath=/usr/bin/curl parent=zsh command=curl kubernetes.default k8s_podname=tmp-shell orig_podname=tmp-shell terminal=34816 container_id=82d0121584ee container_image=docker.io/nicolaka/netshoot container_image_tag=latest container_name=tmp-shell k8s_ns=tmp-namespace k8s_pod_name=tmp-shell)
15:17:14.901195293: Notice Custom Unexpected connection to K8s API Server from container (connection=10.16.1.11:57012->10.0.0.1:80 lport=57012 rport=80 fd_type=ipv4 fd_proto=tcp evt_type=connect user=root user_uid=0 user_loginuid=-1 process=curl proc_exepath=/usr/bin/curl parent=zsh command=curl kubernetes.default k8s_podname=tmp-shell orig_podname=tmp-shell terminal=34816 container_id=82d0121584ee container_image=docker.io/nicolaka/netshoot container_image_tag=latest container_name=tmp-shell k8s_ns=tmp-namespace k8s_pod_name=tmp-shell)
fjellvannet commented 1 week ago

I just reinstalled everything in the same way and still have the error.

Did you try on microk8s specifically? I don't know if it is part of the problem.

To give falco access to the containerd socket (which microk8s has put in a snappy directory apart from the default containerd socket), I had to create the empty files /run/containerd/containerd.sock and /run/containerd/containerd.sock.ttrpc and add these lines to /etc/fstab

/var/snap/microk8s/common/run/containerd.sock /run/containerd/containerd.sock none bind 0 0
/var/snap/microk8s/common/run/containerd.sock.ttrpc /run/containerd/containerd.sock.ttrpc none bind 0 0

It makes the microk8s containerd socket accessible for falco in the default location. This hack fixes the k8s.pod.name field, but not k8smeta.pod.name

What kind of cluster did you use to test?

alacuku commented 1 week ago

Hey @fjellvannet, I used a kubeadm cluster. Can you share the instructions on how to create your environment?

fjellvannet commented 1 week ago

Start with a vanilla Ubuntu 24.04 server amd64 machine.

Install the microk8s snap: sudo snap install microk8s --classic

Add user to the microk8s group to steer microk8s without sudo: sudo usermod -aG microk8s "$USER"

Install / set up kubeconfig for kubectl / helm etc., microk8s config spits it out

Install the kube-prometheus-stack, as grafana constantly triggers my custom rule: helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -n kube-metrics --create-namespace

Create falco-rules-k8smeta.yaml with the following content:

customRules:
  rules-k8smeta.yaml: |-
    - macro: k8s_containers
      condition: >
        (container.image.repository in (gcr.io/google_containers/hyperkube-amd64,
        gcr.io/google_containers/kube2sky,
        docker.io/sysdig/sysdig, sysdig/sysdig,
        fluent/fluentd-kubernetes-daemonset, prom/prometheus,
        falco_containers,
        falco_no_driver_containers,
        ibm_cloud_containers,
        velero/velero,
        quay.io/jetstack/cert-manager-cainjector, weaveworks/kured,
        quay.io/prometheus-operator/prometheus-operator,
        registry.k8s.io/ingress-nginx/kube-webhook-certgen, quay.io/spotahome/redis-operator,
        registry.opensource.zalan.do/acid/postgres-operator, registry.opensource.zalan.do/acid/postgres-operator-ui,
        rabbitmqoperator/cluster-operator, quay.io/kubecost1/kubecost-cost-model,
        docker.io/bitnami/prometheus, docker.io/bitnami/kube-state-metrics, mcr.microsoft.com/oss/azure/aad-pod-identity/nmi)
        or (k8s.ns.name = "kube-system"))

    - macro: never_true
      condition: (evt.num=0)

    - macro: container
      condition: (container.id != host)

    - macro: k8s_api_server
      condition: (fd.sip.name="kubernetes.default.svc.cluster.local")

    - macro: user_known_contact_k8s_api_server_activities
      condition: (never_true)

    - rule: Custom Contact K8S API Server From Container
      desc: >
        Detect attempts to communicate with the K8S API Server from a container by non-profiled users. Kubernetes APIs play a 
        pivotal role in configuring the cluster management lifecycle. Detecting potential unauthorized access to the API server 
        is of utmost importance. Audit your complete infrastructure and pinpoint any potential machines from which the API server 
        might be accessible based on your network layout. If Falco can't operate on all these machines, consider analyzing the 
        Kubernetes audit logs (typically drained from control nodes, and Falco offers a k8saudit plugin) as an additional data 
        source for detections within the control plane.
      condition: >
        evt.type=connect and evt.dir=< 
        and (fd.typechar=4 or fd.typechar=6) 
        and container 
        and k8s_api_server 
        and not k8s_containers 
        and not user_known_contact_k8s_api_server_activities
      output: Custom Unexpected connection to K8s API Server from container (connection=%fd.name lport=%fd.lport rport=%fd.rport fd_type=%fd.type fd_proto=%fd.l4proto evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline k8s_podname=%k8smeta.pod.name orig_podname=%k8s.pod.name terminal=%proc.tty %container.info)
      priority: NOTICE
      tags: [maturity_stable, container, network, k8s, mitre_discovery, T1565]

Deploy falco using Helm and make sure the custom rule is evaluated before the default rule:

helm upgrade --install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set collectors.kubernetes.enabled=true \
    --set falco.rules_file="{/etc/falco/rules.d}" \
  -f falco-rules-k8smeta.yaml

Create the following bash-script that adjusts the path of the volume that mounts the containerd-socket to falco. Microk8s uses its own containerd instance, the socket is stored in /var/snap/microk8s/common/run/containerd.sock where falco cannot find it without help. The following script modifies the volume of the containerd-socket in the falco daemonset to correctly mount the socket in the snap. I have not found a way to make this adaptation directly in the Helm script. It would make the life much easier if it just could be set there directly. The bind-mount I presented in my previous comment has the same effect, but for safe reproducibility the patch script is better I think.

#!/bin/bash

# Replace <name> with your DaemonSet's name
DAEMONSET_NAME="falco"

# Find the index of the 'containerd-socket' volume
INDEX=$(kubectl -n falco get daemonset "$DAEMONSET_NAME" -o json | jq '.spec.template.spec.volumes | map(.name) | index("containerd-socket")')

# Check if the volume was found
if [ "$INDEX" = "null" ]; then
    echo "Volume 'containerd-socket' not found."
    exit 1
fi

# Construct the JSON Patch
PATCH="[{\"op\": \"replace\", \"path\": \"/spec/template/spec/volumes/$INDEX/hostPath/path\", \"value\": \"/var/snap/microk8s/common/run\"}]"

# Apply the patch
kubectl patch daemonset "$DAEMONSET_NAME" --type='json' -p="$PATCH"

When the daemonset has updated and the pod has restarted, enjoy:

falco 09:30:43.245032810: Notice Custom Unexpected connection to K8s API Server from container (connection=10.1.10.89:41658->10.152.183.1:443 lport=41658 rport=443 fd_type=ipv4 fd_proto=tcp evt_type=connect user=<NA> user_uid=472 user_loginuid=-1 process=python proc_exepath=/usr/local/bin/python3.12 parent=python command=python -u /app/sidecar.py k8s_podname=<NA> orig_podname=kube-prometheus-stack-grafana-86844f6b47-t8cg2 terminal=0 container_id=5db2c3be25ce container_image=quay.io/kiwigrid/k8s-sidecar container_image_tag=1.26.1 container_name=grafana-sc-dashboard k8s_ns=kube-metrics k8s_pod_name=kube-prometheus-stack-grafana-86844f6b47-t8cg2)

In the k8smeta field k8s_podname which I added, the value is <NA>

If I have set up sth wrong here or forgotten sth according to the documentation, please tell me :) Could not figure it out myself at least because there are no error-messages indicating a problem.