Cannot run falco "Least Privileged", pmu_fd: operation not permitted

jemag commented 1 year ago

Describe the bug

Falco container fails to start with error: Error: pmu_fd: Operation not permitted

See Logs section below for more info.

How to reproduce it

Install falco on AKS (1.25.6) using helm chart version 3.1.3 and following values:

podAnnotations: 
container.apparmor.security.beta.kubernetes.io/falco: unconfined
tty: false
controller:
kind: daemonset
driver:
enabled: true
kind: ebpf
ebpf:
  hostNetwork: true
  leastPrivileged: true
collectors:
enabled: true

Notice the error in the logs and the crashing pods

Expected behaviour

Falco runs without error

Logs

Logs of the starting pod, Falco:

Tue Apr 11 21:41:52 2023: Falco version: 0.34.1 (x86_64)
Tue Apr 11 21:41:52 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Tue Apr 11 21:41:52 2023: Loading rules from file /etc/falco/falco_rules.yaml
Tue Apr 11 21:41:52 2023: Loading rules from file /etc/falco/rules.d/custom.local.yaml
Tue Apr 11 21:41:52 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Tue Apr 11 21:41:52 2023: gRPC server threadiness equals to 4
Tue Apr 11 21:41:52 2023: Starting health webserver with threadiness 4, listening on port 8765
Tue Apr 11 21:41:52 2023: Enabled event sources: syscall
Tue Apr 11 21:41:52 2023: Opening capture with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
Tue Apr 11 21:41:52 2023: Starting gRPC server at unix:///run/falco/falco.sock
Tue Apr 11 21:41:52 2023: An error occurred in an event source, forcing termination...
Tue Apr 11 21:41:52 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Tue Apr 11 21:41:52 2023: Waiting for the gRPC threads to complete
Tue Apr 11 21:41:52 2023: Draining all the remaining gRPC events
Tue Apr 11 21:41:52 2023: Shutting down gRPC server complete
Error: pmu_fd: Operation not permitted
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

Logs of the driver loader

* Setting up /usr/src links from host
* Running falco-driver-loader for: falco version=0.34.1, driver version=4.0.0+driver, arch=x86_64, kernel release=5.15.0-1034-azure, kernel version=41
* Running falco-driver-loader with: driver=bpf, compile=yes, download=yes
* Mounting debugfs
mount: /sys/kernel/debug: cannot mount nodev read-only.
* Filename 'falco_ubuntu-azure_5.15.0-1034-azure_41.o' is composed of:
 - driver name: falco
 - target identifier: ubuntu-azure
 - kernel release: 5.15.0-1034-azure
 - kernel version: 41
* Trying to download a prebuilt eBPF probe from https://download.falco.org/driver/4.0.0%2Bdriver/x86_64/falco_ubuntu-azure_5.15.0-1034-azure_41.o
* Skipping compilation, eBPF probe is already present in /root/.falco/4.0.0+driver/x86_64/falco_ubuntu-azure_5.15.0-1034-azure_41.o
* eBPF probe located in /root/.falco/4.0.0+driver/x86_64/falco_ubuntu-azure_5.15.0-1034-azure_41.o
* Success: eBPF probe symlinked to /root/.falco/falco-bpf.o

Environment

Falco version: Falco version: 0.34.1 (x86_64)
Cloud provider or hardware configuration: AKS (1.25.6)
OS: PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
Kernel: 5.15.0-1034-azure #41-Ubuntu SMP Fri Feb 10 19:59:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Installation method: Helm chart version version: 3.1.3

Additional context

I tested with both no AppArmor configuration and with unconfined given the notice mentioned in the doc : https://falco.org/docs/getting-started/running/#docker-least-privileged. However, both configurations provide the same error.

Also, falco works fine if using leastPrivileged: false, here is an example of normal logs:

Tue Apr 11 22:04:35 2023: Falco version: 0.34.1 (x86_64)
Tue Apr 11 22:04:35 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Tue Apr 11 22:04:35 2023: Loading rules from file /etc/falco/falco_rules.yaml
Tue Apr 11 22:04:35 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Tue Apr 11 22:04:35 2023: Starting health webserver with threadiness 4, listening on port 8765
Tue Apr 11 22:04:35 2023: Enabled event sources: syscall
Tue Apr 11 22:04:35 2023: Opening capture with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
22:05:19.262671245: Notice Unexpected connection to K8s API Server from container (command=python -u /app/sidecar.py pid=9171 k8s.ns=monitoring k8s.pod=kube-prometheus-stack-grafana-684679fcd6-gpn2b container=541747452f5f image=quay.io/kiwigrid/k8s-sidecar:1.19.2 connection=10.176.16.103:44470->192.168.0.1:443)

Andreagit97 commented 1 year ago

ei @jemag thank you for reporting this, it seems a problem related to the syscall __NR_perf_event_open the CAP_PERFMON could be not enough in some cases, I would suggest you the following changes:

containerSecurityContext:
  capabilities:
    add:
    - SYS_ADMIN
    - SYS_RESOURCE

  driver:
    enabled: true
    kind: ebpf
    ebpf:
      hostNetwork: true
      leastPrivileged: false

So disable leastPrivileged and set the SYS_ADMIN and SYS_RESOURCE capabilities by hand, this is always least privileged but with a richer set of capabilities

In the meanwhile we will try to understand when PERFMON is not enough :thinking:

Andreagit97 commented 1 year ago

@tspearconquest if i remember well you faced a similar issue

tspearconquest commented 1 year ago

Yes, I will try it with SYS_ADMIN, SYS_RESOURCE and SYS_PERFMON and see if I can get it running. Thanks!

Andreagit97 commented 1 year ago

Yes, I will try it with SYS_ADMIN, SYS_RESOURCE and SYS_PERFMON and see if I can get it running. Thanks!

SYS_PERFMON shouldn't be necessary SYS_ADMIN, SYS_RESOURCE are enough, please ensure to set leastPrivileged: false to try it, in this way we are sure they are not overwritten in some way :)

tspearconquest commented 1 year ago

After reviewing my thread on slack, I can confirm that the old probe doesn't seem to work in AKS K8s 1.24 (Ubuntu 18.04) with SYS_ADMIN and SYS_RESOURCE unless falco runs as root. Admittedly, I hadn't tried this in a year, so I will give it another try today.

However, since CAP_PERFMON and CAP_BPF aren't available in this kernel, we're unable to use leastPrivileged: true anyway.

OP here is on Ubuntu 22.04 because their cluster is on AKS K8s 1.25, and Ubuntu 22.04 has a proper kernel, thus its clear that the old probe is not working with CAP_BPF and CAP_PERFMON by themselves as you mentioned in Slack.

I suspect that OP may have success with your modified setup, and I will post back on it for my cluster shortly. Thanks @Andreagit97!

jemag commented 1 year ago

Thanks for the information @Andreagit97. I have tested it with SYS_ADMIN and SYS_RESOURCE and it does work. I have created the following PR https://github.com/falcosecurity/charts/pull/480 to the helm chart if one of you could take a look.

tspearconquest commented 1 year ago

Hi @Andreagit97 I tried and its not working for me.

As you requested my values.yaml file in slack, I am posting it below.

values:

  collectors:
    crio:
      enabled: false
  controller:
    annotations:
      ignore-check.kube-linter.io/docker-sock: Falco requires access to the docker socket
      ignore-check.kube-linter.io/host-network: Falco requires access to the host network
      ignore-check.kube-linter.io/no-read-only-root-fs: Falco driver loader requires a writable rootfs
      ignore-check.kube-linter.io/privilege-escalation-container: Falco requires access to privilege escalation functionality until Kernel 5.8
      ignore-check.kube-linter.io/privileged-container: Falco requires access to privileged system operations until Kernel 5.8
      ignore-check.kube-linter.io/run-as-non-root: Falco requires root access until Kernel 5.8
      ignore-check.kube-linter.io/sensitive-host-mounts: Falco requires access to sensitive host mounts
      ignore-check.kube-linter.io/unset-cpu-requirements: We don't set cpu limits
    daemonset:
      updateStrategy:
        rollingUpdate:
          maxUnavailable: "100%"
  image:
    pullPolicy: Always
    registry: registry.gitlab.com
    repository: my-private/falco/falco-no-driver
  imagePullSecrets:
  - name: gitlab-registry
  namespaceOverride: falco
  podAnnotations:
    container.apparmor.security.beta.kubernetes.io/falco: "runtime/default"
    container.apparmor.security.beta.kubernetes.io/falco-driver-loader: "runtime/default"
    container.apparmor.security.beta.kubernetes.io/falco-exporter: "runtime/default"
    container.apparmor.security.beta.kubernetes.io/falco-socket-permissions: "runtime/default"
    container.apparmor.security.beta.kubernetes.io/falcoctl-artifact-follow: "runtime/default"
    container.apparmor.security.beta.kubernetes.io/falcoctl-artifact-install: "runtime/default"
  podLabels:
    app: falco
  # -- Set securityContext for the pods
  # These security settings are overriden by the ones specified for the specific
  # containers when there is overlap.
  podSecurityContext:
    fsGroup: 55532
    seccompProfile:
      type: RuntimeDefault
  # Note that `containerSecurityContext`:
  #  - will not apply to init containers, if any;
  #  - takes precedence over other automatic configurations (see below).
  #
  # Based on the `driver` configuration the auto generated settings are:
  # 1) driver.enabled = false:
  #    securityContext: {}
  #
  # 2) driver.enabled = true and (driver.kind = module || driver.kind = modern-bpf):
  #    securityContext:
  #     privileged: true
  #
  # 3) driver.enabled = true and driver.kind = ebpf:
  #    securityContext:
  #     privileged: true
  #
  # 4) driver.enabled = true and driver.kind = ebpf and driver.ebpf.leastPrivileged = true
  #    securityContext:
  #     capabilities:
  #      add:
  #      - BPF
  #      - SYS_RESOURCE
  #      - PERFMON
  #      - SYS_PTRACE
  #
  # -- Set securityContext for the Falco container.For more info see the "falco.securityContext" helper in "pod-template.tpl"
  containerSecurityContext:
    capabilities:
      add:
      - SYS_ADMIN
      - SYS_RESOURCE
      drop:
      - ALL
    privileged: true
    readOnlyRootFilesystem: true
    runAsGroup: 55532
    runAsUser: 55532
    seccompProfile:
      type: RuntimeDefault
  scc:
    # -- Create OpenShift's Security Context Constraint.
    create: false
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: null
      memory: 1024Mi
  tolerations:
  - key: CriticalInfra
    operator: Exists
  healthChecks:
    livenessProbe:
      initialDelaySeconds: 30
    readinessProbe:
      initialDelaySeconds: 0
  services:
  - name: k8saudit-webhook
    ports:
    - port: 8765 # See plugin open_params
      protocol: TCP
  - name: exporter
    clusterIP: None
    ports:
    - name: metrics
      port: 9376
      protocol: TCP
  mounts:
    volumeMounts:
##    - mountPath: /root/.falco
##      name: root-falco-fs
##      readOnly: true
    - mountPath: /host/proc
      name: proc-fs
      readOnly: true
    - mountPath: /sys/kernel/debug
      name: debugfs
      readOnly: true
    - mountPath: /host/var/run/docker.sock
      name: docker-socket
      readOnly: true
    - mountPath: /host/run/containerd/containerd.sock
      name: containerd-socket
      readOnly: true
    - mountPath: /etc/falco/falco.yaml
      name: falco-yaml
      readOnly: true
      subPath: falco.yaml
##    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
##      name: falco
##      readOnly: true
    volumes:
    - emptyDir: {}
      name: falcoctl-tmp
    - emptyDir: {}
      name: grpc-socket-dir
    - configMap:
        defaultMode: 291
        items:
        - key: falcoctl.yaml
          path: falcoctl.yaml
        name: falco-falcoctl
      name: falcoctl-config-volume
    - configMap:
        defaultMode: 291
        items:
        - key: falco.yaml
          path: falco.yaml
        name: falco
      name: falco-yaml
    - hostPath:
        path: /boot
        type: Directory
      name: boot-fs
    - hostPath:
        path: /lib/modules
        type: Directory
      name: lib-modules
    - hostPath:
        path: /usr
        type: Directory
      name: usr-fs
    - hostPath:
        path: /etc
        type: Directory
      name: etc-fs
    - hostPath:
        path: /sys/kernel/debug
        type: Directory
      name: debugfs
    - hostPath:
        path: /run/containerd/containerd.sock
        type: Socket
      name: containerd-socket
    - hostPath:
        path: /proc
        type: Directory
      name: proc-fs
##    - name: falco
##      projected:
##        defaultMode: 291
##        sources:
##        - configMap:
##            name: kube-root-ca.crt
##        - downwardAPI:
##            items:
##            - fieldRef:
##                apiVersion: v1
##                fieldPath: metadata.namespace
##              path: namespace
##        - serviceAccountToken:
##            path: token
  driver:
    kind: ebpf
    ebpf:
      # -- Needed to enable eBPF JIT at runtime for performance reasons.
      # Can be skipped if eBPF JIT is enabled from outside the container
      hostNetwork: true
      # -- Constrain Falco with capabilities instead of running a privileged container.
      # This option is only supported with the eBPF driver and a kernel >= 5.8.
      # Ensure the eBPF driver is enabled (i.e., setting the `driver.kind` option to `ebpf`).
      leastPrivileged: false
    # -- Configuration for the Falco init container.
    loader:
      initContainer:
        image:
          pullPolicy: Always
          registry: registry.gitlab.com
          repository: my-private/falco/falco-driver-loader
        resources:
          limits:
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          runAsGroup: 55532
          seccompProfile:
            type: RuntimeDefault
  falcoctl:
    image:
      pullPolicy: Always
      registry: registry.gitlab.com
      repository: my-private/falco/falcoctl
    artifact:
      install:
        resources:
          limits:
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 55532
          runAsGroup: 55532
          seccompProfile:
            type: RuntimeDefault
      follow:
        resources:
          limits:
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 55532
          runAsGroup: 55532
          seccompProfile:
            type: RuntimeDefault
    config:
      # -- List of indexes that falcoctl downloads and uses to locate and download artiafcts. For more info see:
      # https://github.com/falcosecurity/falcoctl/blob/main/proposals/20220916-rules-and-plugin-distribution.md#index-file-overview
      indexes:
      - name: falcosecurity
        url: https://falcosecurity.github.io/falcoctl/index.yaml
      # -- Configuration used by the artifact commands.
      artifact:
        # -- List of artifact types that falcoctl will handle. If the configured refs resolves to an artifact whose type is not contained
        # in the list it will refuse to downloade and install that artifact.
        allowedTypes:
          - rulesfile
        install:
          # -- List of artifacts to be installed by the falcoctl init container.
          # We do not recommend installing (or following) plugins for security reasons since they are executable objects.
          refs: [falco-rules:0, k8saudit-rules:0.5]
        follow:
          # -- List of artifacts to be followed by the falcoctl sidecar container.
          # We do not recommend installing (or following) plugins for security reasons since they are executable objects.
          refs: [falco-rules:0, k8saudit-rules:0.5]
  ######################
  # falco.yaml config  #
  ######################
  falco:
  # File(s) or Directories containing Falco rules, loaded at startup.
  # The name "rules_file" is only for backwards compatibility.
  # If the entry is a file, it will be read directly. If the entry is a directory,
  # every file in that directory will be read, in alphabetical order.
  #
  # falco_rules.yaml ships with the falco package and is overridden with
  # every new software version. falco_rules.local.yaml is only created
  # if it doesn't exist. If you want to customize the set of rules, add
  # your customizations to falco_rules.local.yaml.
  #
  # The files will be read in the order presented here, so make sure if
  # you have overrides they appear in later files.
    # -- The location of the rules files that will be consumed by Falco.
    rules_file:
      - /etc/falco/falco_rules.yaml
      - /etc/falco/falco_rules.local.yaml
      - /etc/falco/k8s_audit_rules.yaml
      - /etc/falco/rules.d
    #
    # To learn more about the supported formats for
    # init_config/open_params for the cloudtrail plugin, see the README at
    # https://github.com/falcosecurity/plugins/blob/master/plugins/cloudtrail/README.md.
    # -- Plugins configuration. Add here all plugins and their configuration. Please
    # consult the plugins documentation for more info. Remember to add the plugins name in
    # "load_plugins: []" in order to load them in Falco.
    plugins:
      - name: k8saudit
        library_path: libk8saudit.so
        init_config:
          ""
        # maxEventSize: 262144
        # maxEventBytes: 1048576
        # webhookMaxBatchSize: 12582912
        # sslCertificate: /etc/falco/falco.pem
        open_params: "http://:8765/k8s-audit"
      - name: cloudtrail
        library_path: libcloudtrail.so
        # see docs for init_config and open_params:
        # https://github.com/falcosecurity/plugins/blob/master/plugins/cloudtrail/README.md
      - name: json
        library_path: libjson.so
        init_config: ""
    # Setting this list to empty ensures that the above plugins are *not*
    # loaded and enabled by default. If you want to use the above plugins,
    # set a meaningful init_config/open_params for the cloudtrail plugin
    # and then change this to:
    # load_plugins: [cloudtrail, json]
    # -- Add here the names of the plugins that you want to be loaded by Falco. Please make sure that
    # plugins have been configured under the "plugins" section before adding them here.
    # Please make sure to configure the falcoctl tool to download and install the very same plugins
    # you are loading here. You should add the references in the falcoctl.config.artifact.install.refs array
    # for each plugin you are loading.
    load_plugins: [k8saudit, json]
    # -- Watch config file and rules files for modification.
    # When a file is modified, Falco will propagate new config,
    # by reloading itself.
    watch_config_files: true
    # -- If true, the times displayed in log messages and output messages
    # will be in ISO 8601. By default, times are displayed in the local
    # time zone, as governed by /etc/localtime.
    time_format_iso_8601: true
    # -- If "true", print falco alert messages and rules file
    # loading/validation results as json, which allows for easier
    # consumption by downstream programs. Default is "false".
    json_output: true
    # -- When using json output, whether or not to include the "output" property
    # itself (e.g. "File below a known binary directory opened for writing
    # (user=root ....") in the json output.
    json_include_output_property: true
    # -- When using json output, whether or not to include the "tags" property
    # itself in the json output. If set to true, outputs caused by rules
    # with no tags will have a "tags" field set to an empty array. If set to
    # false, the "tags" field will not be included in the json output at all.
    json_include_tags_property: true
    # -- Send information logs to stderr. Note these are *not* security
    # notification logs! These are just Falco lifecycle (and possibly error) logs.
    log_stderr: true
    # -- Send information logs to syslog. Note these are *not* security
    # notification logs! These are just Falco lifecycle (and possibly error) logs.
    log_syslog: false
    # -- Minimum log level to include in logs. Note: these levels are
    # separate from the priority field of rules. This refers only to the
    # log level of falco's internal logging. Can be one of "emergency",
    # "alert", "critical", "error", "warning", "notice", "info", "debug".
    log_level: info
    # -- Minimum rule priority level to load and run. All rules having a
    # priority more severe than this level will be loaded/run.  Can be one
    # of "emergency", "alert", "critical", "error", "warning", "notice",
    # "informational", "debug".
    priority: debug
    # -- Whether or not output to any of the output channels below is
    # buffered. Defaults to false
    buffered_outputs: false
    # Falco uses a shared buffer between the kernel and userspace to pass
    # system call information. When Falco detects that this buffer is
    # full and system calls have been dropped, it can take one or more of
    # the following actions:
    #   - ignore: do nothing (default when list of actions is empty)
    #   - log: log a DEBUG message noting that the buffer was full
    #   - alert: emit a Falco alert noting that the buffer was full
    #   - exit: exit Falco with a non-zero rc
    #
    # Notice it is not possible to ignore and log/alert messages at the same time.
    #
    # The rate at which log/alert messages are emitted is governed by a
    # token bucket. The rate corresponds to one message every 30 seconds
    # with a burst of one message (by default).
    #
    # The messages are emitted when the percentage of dropped system calls
    # with respect the number of events in the last second
    # is greater than the given threshold (a double in the range [0, 1]).
    #
    # For debugging/testing it is possible to simulate the drops using
    # the `simulate_drops: true`. In this case the threshold does not apply.
    syscall_event_drops:
      threshold: .1
      actions:
        - log
        - alert
      rate: .03333
      max_burst: 10
      simulate_drops: false
    # Falco uses a shared buffer between the kernel and userspace to receive
    # the events (eg., system call information) in userspace.
    #
    # Anyways, the underlying libraries can also timeout for various reasons.
    # For example, there could have been issues while reading an event.
    # Or the particular event needs to be skipped.
    # Normally, it's very unlikely that Falco does not receive events consecutively.
    #
    # Falco is able to detect such uncommon situation.
    #
    # Here you can configure the maximum number of consecutive timeouts without an event
    # after which you want Falco to alert.
    # By default this value is set to 1000 consecutive timeouts without an event at all.
    # How this value maps to a time interval depends on the CPU frequency.
    syscall_event_timeouts:
      # -- Maximum number of consecutive timeouts without an event
      # after which you want Falco to alert.
      max_consecutives: 1000
    # --- [Description]
    #
    # This is an index that controls the dimension of the syscall buffers.
    # The syscall buffer is the shared space between Falco and its drivers where all the syscall events
    # are stored.
    # Falco uses a syscall buffer for every online CPU, and all these buffers share the same dimension.
    # So this parameter allows you to control the size of all the buffers!
    #
    # --- [Usage]
    #
    # You can choose between different indexes: from `1` to `10` (`0` is reserved for future uses).
    # Every index corresponds to a dimension in bytes:
    #
    # [(*), 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, 32 MB, 64 MB, 128 MB, 256 MB, 512 MB]
    #   ^    ^     ^     ^     ^     ^      ^      ^       ^       ^       ^
    #   |    |     |     |     |     |      |      |       |       |       |
    #   0    1     2     3     4     5      6      7       8       9       10
    #
    # As you can see the `0` index is reserved, while the index `1` corresponds to
    # `1 MB` and so on.
    #
    # These dimensions in bytes derive from the fact that the buffer size must be:
    # (1) a power of 2.
    # (2) a multiple of your system_page_dimension.
    # (3) greater than `2 * (system_page_dimension)`.
    #
    # According to these constraints is possible that sometimes you cannot use all the indexes, let's consider an
    # example to better understand it:
    # If you have a `page_size` of 1 MB the first available buffer size is 4 MB because 2 MB is exactly
    # `2 * (system_page_size)` -> `2 * 1 MB`, but this is not enough we need more than `2 * (system_page_size)`!
    # So from this example is clear that if you have a page size of 1 MB the first index that you can use is `3`.
    #
    # Please note: this is a very extreme case just to let you understand the mechanism, usually the page size is something
    # like 4 KB so you have no problem at all and you can use all the indexes (from `1` to `10`).
    #
    # To check your system page size use the Falco `--page-size` command line option. The output on a system with a page
    # size of 4096 Bytes (4 KB) should be the following:
    #
    # "Your system page size is: 4096 bytes."
    #
    # --- [Suggestions]
    #
    # Before the introduction of this param the buffer size was fixed to 8 MB (so index `4`, as you can see
    # in the default value below).
    # You can increase the buffer size when you face syscall drops. A size of 16 MB (so index `5`) can reduce
    # syscall drops in production-heavy systems without noticeable impact. Very large buffers however could
    # slow down the entire machine.
    # On the other side you can try to reduce the buffer size to speed up the system, but this could
    # increase the number of syscall drops!
    # As a final remark consider that the buffer size is mapped twice in the process' virtual memory so a buffer of 8 MB
    # will result in a 16 MB area in the process virtual memory.
    # Please pay attention when you use this parameter and change it only if the default size doesn't fit your use case.
    # -- This is an index that controls the dimension of the syscall buffers.
    syscall_buf_size_preset: 4
    ############## [EXPERIMENTAL] Modern BPF probe specific ##############
    # Please note: these configs regard only the modern BPF probe. They
    # are experimental so they could change over releases.
    #
    # `cpus_for_each_syscall_buffer`
    #
    # --- [Description]
    #
    # This is an index that controls how many CPUs you want to assign to a single
    # syscall buffer (ring buffer). By default, every syscall buffer is associated to
    # 2 CPUs, so the mapping is 1:2. The modern BPF probe allows you to choose different
    # mappings, for example, 1:1 would mean a syscall buffer for each CPU.
    #
    # --- [Usage]
    #
    # You can choose between different indexes: from `0` to `MAX_NUMBER_ONLINE_CPUs`.
    # `0` is a special value and it means a single syscall buffer shared between all
    # your online CPUs. `0` has the same effect as `MAX_NUMBER_ONLINE_CPUs`, the rationale
    # is that `0` allows you to create a single buffer without knowing the number of online
    # CPUs on your system.
    # Let's consider an example to better understand it:
    #
    # Consider a system with 7 online CPUs:
    #
    #          CPUs     0  X  2  3  X  X  6  7  8  9   (X means offline CPU)
    #
    # - `1` means a syscall buffer for each CPU so 7 buffers
    #
    #          CPUs     0  X  2  3  X  X  6  7  8  9   (X means offline CPU)
    #                   |     |  |        |  |  |  |
    #       BUFFERs     0     1  2        3  4  5  6
    #
    # - `2` (Default value) means a syscall buffer for each CPU pair, so 4 buffers
    #
    #          CPUs     0  X  2  3  X  X  6  7  8  9   (X means offline CPU)
    #                   |     |  |        |  |  |  |
    #       BUFFERs     0     0  1        1  2  2  3
    #
    # Please note that we need 4 buffers, 3 buffers are associated with CPU pairs, the last
    # one is mapped with just 1 CPU since we have an odd number of CPUs.
    #
    # - `0` or `MAX_NUMBER_ONLINE_CPUs` mean a syscall buffer shared between all CPUs, so 1 buffer
    #
    #          CPUs     0  X  2  3  X  X  6  7  8  9   (X means offline CPU)
    #                   |     |  |        |  |  |  |
    #       BUFFERs     0     0  0        0  0  0  0
    #
    # Moreover you can combine this param with `syscall_buf_size_preset`
    # index, for example, you could create a huge single syscall buffer
    # shared between all your online CPUs of 512 MB (so `syscall_buf_size_preset=10`).
    #
    # --- [Suggestions]
    #
    # We chose index `2` (so one syscall buffer for each CPU pair) as default because the modern bpf probe
    # follows a different memory allocation strategy with respect to the other 2 drivers (bpf and kernel module).
    # By the way, you are free to find the preferred configuration for your system.
    # Considering a fixed `syscall_buf_size_preset` and so a fixed buffer dimension:
    # - a lower number of buffers can speed up your system (lower memory footprint)
    # - a too lower number of buffers could increase contention in the kernel causing an
    #   overall slowdown of the system.
    # If you don't have huge events throughputs and you are not experimenting with tons of drops
    # you can try to reduce the number of buffers to have a lower memory footprint
    modern_bpf:
      # -- [MODERN PROBE ONLY] This is an index that controls how many CPUs you want to assign to a single syscall buffer.
      cpus_for_each_syscall_buffer: 2
    ############## [EXPERIMENTAL] Modern BPF probe specific ##############
    # Falco continuously monitors outputs performance. When an output channel does not allow
    # to deliver an alert within a given deadline, an error is reported indicating
    # which output is blocking notifications.
    # The timeout error will be reported to the log according to the above log_* settings.
    # Note that the notification will not be discarded from the output queue; thus,
    # output channels may indefinitely remain blocked.
    # An output timeout error indeed indicate a misconfiguration issue or I/O problems
    # that cannot be recovered by Falco and should be fixed by the user.
    #
    # The "output_timeout" value specifies the duration in milliseconds to wait before
    # considering the deadline exceed.
    #
    # With a 2000ms default, the notification consumer can block the Falco output
    # for up to 2 seconds without reaching the timeout.
    # -- Duration in milliseconds to wait before considering the output timeout deadline exceed.
    output_timeout: 2000
    # A throttling mechanism implemented as a token bucket limits the
    # rate of Falco notifications. One rate limiter is assigned to each event
    # source, so that alerts coming from one can't influence the throttling
    # mechanism of the others. This is controlled by the following options:
    #  - rate: the number of tokens (i.e. right to send a notification)
    #    gained per second. When 0, the throttling mechanism is disabled.
    #    Defaults to 0.
    #  - max_burst: the maximum number of tokens outstanding. Defaults to 1000.
    #
    # With these defaults, the throttling mechanism is disabled.
    # For example, by setting rate to 1 Falco could send up to 1000 notifications
    # after an initial quiet period, and then up to 1 notification per second
    # afterward. It would gain the full burst back after 1000 seconds of
    # no activity.
    outputs:
      rate: 1
      max_burst: 1000
    # Where security notifications should go.
    # Multiple outputs can be enabled.
    syslog_output:
      # -- Enable syslog output for security notifications.
      enabled: false
    stdout_output:
      # -- Enable stdout output for security notifications.
      enabled: true
    # Falco contains an embedded webserver that exposes a healthy endpoint that can be used to check if Falco is up and running.
    # By default the endpoint is /healthz
    #
    # The ssl_certificate is a combination SSL Certificate and corresponding
    # key contained in a single file. You can generate a key/cert as follows:
    #
    # $ openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem
    # $ cat certificate.pem key.pem > falco.pem
    # $ sudo cp falco.pem /etc/falco/falco.pem
    webserver:
      # -- Enable Falco embedded webserver.
      enabled: true
      # -- Number of threads depending on the number of online cores.
      threadiness: 0
      # -- Port where Falco embedded webserver listen to connections.
      listen_port: 8765
      # -- Endpoint where Falco receives the audit logs.
      k8s_audit_endpoint: /k8s-audit
      # -- Endpoint where Falco exposes the health status.
      k8s_healthz_endpoint: /healthz
      # -- Enable SSL on Falco embedded webserver.
      ssl_enabled: false
      # -- Certificate bundle path for the Falco embedded webserver.
      ssl_certificate: /etc/falco/falco.pem
    # Falco supports running a gRPC server with two main binding types
    # 1. Over the network with mandatory mutual TLS authentication (mTLS)
    # 2. Over a local unix socket with no authentication
    # By default, the gRPC server is disabled, with no enabled services (see grpc_output)
    # please comment/uncomment and change accordingly the options below to configure it.
    # Important note: if Falco has any troubles creating the gRPC server
    # this information will be logged, however the main Falco daemon will not be stopped.
    # gRPC server over network with (mandatory) mutual TLS configuration.
    # This gRPC server is secure by default so you need to generate certificates and update their paths here.
    # By default the gRPC server is off.
    # You can configure the address to bind and expose it.
    # By modifying the threadiness configuration you can fine-tune the number of threads (and context) it will use.
    # grpc:
    #   enabled: true
    #   bind_address: "0.0.0.0:5060"
    #   # when threadiness is 0, Falco sets it by automatically figuring out the number of online cores
    #   threadiness: 0
    #   private_key: "/etc/falco/certs/server.key"
    #   cert_chain: "/etc/falco/certs/server.crt"
    #   root_certs: "/etc/falco/certs/ca.crt"
    # -- gRPC server using an unix socket
    grpc:
      enabled: true
      # bind_address: "unix:///run/falco/falco.sock"
      # threadiness: 0
    # gRPC output service.
    # By default it is off.
    # By enabling this all the output events will be kept in memory until you read them with a gRPC client.
    # Make sure to have a consumer for them or leave this disabled.
    grpc_output:
      enabled: true
    # Container orchestrator metadata fetching params
    metadata_download:
      max_mb: 100
      chunk_wait_us: 1000
      watch_freq_sec: 1

tspearconquest commented 1 year ago

Falco log:

2023-04-13T05:53:32+0000: Falco version: 0.34.1 (x86_64)
2023-04-13T05:53:32+0000: Falco initialized with configuration file: /etc/falco/falco.yaml
2023-04-13T05:53:32+0000: Loading plugin 'k8saudit' from file /usr/share/falco/plugins/libk8saudit.so
2023-04-13T05:53:32+0000: Loading plugin 'json' from file /usr/share/falco/plugins/libjson.so
2023-04-13T05:53:32+0000: Loading rules from file /etc/falco/falco_rules.yaml
2023-04-13T05:53:32+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml
2023-04-13T05:53:32+0000: Loading rules from file /etc/falco/rules.d/falco_rules.local.yaml
2023-04-13T05:53:32+0000: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
2023-04-13T05:53:32+0000: gRPC server threadiness equals to 8
2023-04-13T05:53:32+0000: Starting health webserver with threadiness 8, listening on port 8765
2023-04-13T05:53:32+0000: Enabled event sources: k8s_audit, syscall
2023-04-13T05:53:32+0000: Opening capture with plugin 'k8saudit'
2023-04-13T05:53:32+0000: Starting gRPC server at unix:///run/falco/falco.sock
2023-04-13T05:53:32+0000: Opening capture with BPF probe. BPF probe path: /home/falco/.falco/falco-bpf.o
2023-04-13T05:53:32+0000: An error occurred in an event source, forcing termination...
2023-04-13T05:53:32+0000: Shutting down gRPC server. Waiting until external connections are closed by clients
2023-04-13T05:53:32+0000: Waiting for the gRPC threads to complete
2023-04-13T05:53:32+0000: Draining all the remaining gRPC events
2023-04-13T05:53:32+0000: Shutting down gRPC server complete
Error: setrlimit failed: Operation not permitted
listen tcp :8765: bind: address already in use
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:

Andreagit97 commented 1 year ago

Uhm thank you for this, this is actually a little bit different from the usual error

Error: setrlimit failed: Operation not permitted

It seems an issue related to CAP_SYS_RESOURCE, btw I will take a more in-depth look :eye:

Andreagit97 commented 1 year ago

This seems pretty strange, probably drop is removing all the capabilities

  containerSecurityContext:
    capabilities:
      add:
      - SYS_ADMIN
      - SYS_RESOURCE
      drop:
      - ALL

Could you try to run Falco with the following helm chart:

diff --git a/falco/values.yaml b/falco/values.yaml
index be01dab..acf89e0 100644
--- a/falco/values.yaml
+++ b/falco/values.yaml
@@ -76,7 +76,11 @@ podSecurityContext: {}
 #      - SYS_PTRACE
 #
 # -- Set securityContext for the Falco container.For more info see the "falco.securityContext" helper in "pod-template.tpl"
-containerSecurityContext: {}
+containerSecurityContext:
+  capabilities:
+    add:
+    - SYS_ADMIN
+    - SYS_RESOURCE

 scc:
   # -- Create OpenShift's Security Context Constraint.
@@ -177,7 +181,7 @@ driver:
   # Always set it to false when using Falco with plugins.
   enabled: true
   # -- Tell Falco which driver to use. Available options: module (kernel driver), ebpf (eBPF probe), modern-bpf (modern eBPF probe).
-  kind: module
+  kind: ebpf
   # -- Configuration section for ebpf driver.
   ebpf:
     # -- Path where the eBPF probe is located. It comes handy when the probe have been installed in the nodes using tools other than the init

tspearconquest commented 1 year ago

Thanks. Unfortunately I'm constrained by Gatekeeper here and must drop all privileges, however I can confirm the following:

I am already using the ebpf driver, and
According to Synk drop: ["ALL"] with add: ["SYS_ADMIN", "SYS_RESOURCE"] should work without issues.

Andreagit97 commented 1 year ago

Uhm ok, I understood but it seems like you are first adding some capabilities and then removing them all. Also reading the link you shared, it is not so clear if the order of drop and add counts :thinking:

In the case where you do have to allow capabilities it is good practice to first drop all default capabilities and only then add only the ones you need.

Maybe you have just to invert their order, first drop, and then add (?)

BTW I'm pretty sure that in the end, the pod is running without capabilities if I try to run Falco without capabilities I face the same error because I miss CAP_SYS_RESOURCE.

tspearconquest commented 1 year ago

Maybe you have just to invert their order, first drop, and then add (?)

Maybe, but I thought that yaml could be sorted in any way as I typically find most manifests are sorted alphabetically. I will give it a try with the order swapped and report back.

Andreagit97 commented 1 year ago

Related to the initial issue here with CAP_PERFMON and CAP_BPF:

Recent kernels (>=5.8) introduced new capabilities to split CAP_SYS_ADMIN into several capabilities (for example CAP_PERFMON and CAP_BPF). So if you have a recent enough kernel you should be able to run the old probe and the modern probe with new capabilities. Unfortunately, this is not as simple as it sounds... There is a second variable to take into consideration: kernel.perf_event_paranoid. Reading the manual it seems that perf_event_paranoid influences only the behavior of unprivileged users, if you have the right capabilities all should work fine

-1 - not paranoid at all
0  - disallow raw tracepoint access for unprivileged users
1  - disallow CPU events for unprivileged users
2  - disallow kernel profiling for unprivileged users

But under the hood, some distros like Debian and Ubuntu introduce other perf_event_paranoid levels, see Ubuntu here: https://kernel.ubuntu.com/git/ubuntu-stable/ubuntu-stable-jammy.git/tree/kernel/events/core.c#n11991

        err = security_perf_event_open(&attr, PERF_SECURITY_OPEN);
    if (err)
        return err;

    if (perf_paranoid_any() && !capable(CAP_SYS_ADMIN))
        return -EACCES;

where perf_paranoid_any() is

static inline bool perf_paranoid_any(void)
{
    return sysctl_perf_event_paranoid > 2;
}

As you can easily notice if your kernel.perf_event_paranoid is >2 CAP_PERFMON will be not enough, you will need CAP_SYS_ADMIN! that's the reason why the old probe could work with CAP_PERFMON + CAP_BPF only on some nodes, probably in these nodes kernel.perf_event_paranoid is <= 2

Supposing that kernel 5.8 is the first one with CAP_BPF and CAP_PERFMON available (this could be not true it really depends on you kernel patches/backports):

old probe:

(Kernels < 5.8) -> needs CAP_SYS_ADMIN
(Kernels >= 5.8) and kernel.perf_event_paranoid <= 2 -> can use CAP_BPF and CAP_PERFMON
(Kernels >= 5.8) and kernel.perf_event_paranoid > 2 -> it really depends on your distro but usually needs CAP_SYS_ADMIN

modern probe:

(Kernels < 5.8) -> needs CAP_SYS_ADMIN
(Kernels >= 5.8) -> can use CAP_BPF and CAP_PERFMON (the modern probe use the BPF ring-buffer so no need of kernel.perf_event_paranoid

jemag commented 1 year ago

Thanks for the clarification. I did just try modern-bpf with CAP_BPF and CAP_PERFMON and it works great too. It just is a bit misleading right now for end-users of the helm chart since by default it will set privileged to true for modern-bpf: https://github.com/falcosecurity/charts/blob/6fff4e1e75c43af742596f478bf86e7723aa08da/falco/templates/pod-template.tpl#L392-L402

Andreagit97 commented 1 year ago

It just is a bit misleading right now for end-users of the helm chart since by default it will set privileged to true for modern-bpf:

yep, that was intentional because the modern probe was an experimental feature in Falco 0.34 and we were not sure about the least privileged support, but it seems to work well so we will update the helm chart for Falco 0.35 since the modern probe will be an officially supported driver :)!

jemag commented 1 year ago

btw in regards to the drop then add or add then drop, I don't believe that makes a difference. For example, testing both cases here in 1.25.6:

Best way to eliminate that possibility would be to check directly in your specific running container though

Andreagit97 commented 1 year ago

@jemag got it to thank you for testing! Do you mind checking your paranoid value on your host, just to see if my guess is right https://github.com/falcosecurity/falco/issues/2487#issuecomment-1507098697?

sysctl kernel.perf_event_paranoid

or

cat /proc/sys/kernel/perf_event_paranoid

should be enough to check the paranoid value

jemag commented 1 year ago

Sounds about right:

Andreagit97 commented 1 year ago

thank you!

leogr commented 1 year ago

Sorry, I've not gone into details of the whole discussion, but this reminds me of one thing I want to report since it may be helpful.

CAP_SYS_PTRACE is required by Falco because it generally allows access to/proc/<pid>/environ. This is likely related to the container runtime, not the kernel itself. See here a very old discussion: https://github.com/moby/moby/issues/6607.

Note that running Falco without CAP_SYS_PTRACE could work and apparently present no issue. However, since it won't be able to access some data, it may malfunction later or don't able to provide some metadata in output.

fcrespofastly commented 1 year ago

It just is a bit misleading right now for end-users of the helm chart since by default it will set privileged to true for modern-bpf:

yep, that was intentional because the modern probe was an experimental feature in Falco 0.34 and we were not sure about the least privileged support, but it seems to work well so we will update the helm chart for Falco 0.35 since the modern probe will be an officially supported driver :)!

From what I can tell the new chart still forces being privileged when choosing modern-bpf

Andreagit97 commented 1 year ago

yep that's true sorry about that we will work on it these days!

fcrespofastly commented 1 year ago

@Andreagit97 thanks!! let me know the changes you make :)

Btw, modern-bpf worked on GKE (COS) but we also have some kubernetes AWS clusters were we're running on Ubuntu 20.04 with kernel 5.15.0-1028-aws and I found this:

libbpf: failed to iterate BTF objects: -1
libbpf: prog 't1_execve_x': relo #736: target candidate search failed for [1454] struct audit_task_info: -1
libbpf: prog 't1_execve_x': relo #736: failed to relocate: -1
libbpf: failed to perform CO-RE relocations: -1
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -1
libpman: failed to load BPF object (errno: 1 | message: Operation not permitted)
2023-06-16T15:17:42+0000: An error occurred in an event source, forcing termination

The only way I could make it work was adding CAP_SYS_ADMIN . Using ebpf with leastPrivileged I can reproduce the issue described in this issue (hence why we were switching to modern-bpf)

Not sure if you have any hint, but the capabilities added are:

 - CAP_BPF
 - CAP_SYS_RESOURCE
 - CAP_PERFMON
 - CAP_SYS_PTRACE
 - CAP_SYS_ADMIN --> I want to get rid of this one.

And for what is worth I also found that in GKE clusters: kernel.perf_event_paranoid=2 vs in AWS it's kernel.perf_event_paranoid=4 though I tried changing it to 2 and didn't make a difference.

Andreagit97 commented 1 year ago

Ei thank you for reporting this while updating the helm chart with the patch I faced the same issue with Falco 0.35.0 on Ubuntu! I understood where the issue is but I still need to find a way to solve it! In the next weeks, we will release a Falco patch release (0.35.1) hopefully these issues will be solved, I will keep you updated!

fcrespofastly commented 1 year ago

Thanks @Andreagit97 ! Please let me know if I can test things on my side too in the meantime !

Andreagit97 commented 1 year ago

Yes just one question, you said "Btw, modern-bpf worked on GKE (COS) but we also have some ...". Do you use Falco 0.35.0 on GKE? And if yes, does it correctly run with this set of capabilities? {CAP_BPF, CAP_SYS_RESOURCE, CAP_PERFMON, CAP_SYS_PTRACE}

fcrespofastly commented 1 year ago

Yes just one question, you said "Btw, modern-bpf worked on GKE (COS) but we also have some ...". Do you use Falco 0.35.0 on GKE? And if yes, does it correctly run with this set of capabilities? {CAP_BPF, CAP_SYS_RESOURCE, CAP_PERFMON, CAP_SYS_PTRACE}

Yes, Falco 0.35.0 with modern bpf plus the capabilities you mentioned

Andreagit97 commented 1 year ago

thanks!

fcrespofastly commented 1 year ago

@Andreagit97 more inputs. We also have a GKE clusters that has a mix of COS and Ubuntu servers as workers (mainly because our CI system runs there) and while deploying Falco there the instances running on COS worked perfectly fine while the ones running on Ubuntu failed with the error I mentioned above.

Btw I'm not sure if all of this apply to this thread/github issue anymore, feel free to forward me to somewhere else.

Have a great day!

Andreagit97 commented 1 year ago

Thank you for the update I'm working on that!

fcrespofastly commented 1 year ago

@Andreagit97 I see falco 0.35.1 is out there! should I give this a go?

Andreagit97 commented 1 year ago

Uhm yes, Falco 0.35.1 should be out in hours, if you use the helm chart to deploy Falco we still need to update them (probably tomorrow) so unfortunately you have to wait while if you use other deployment strategies you are good to go!

BTW i will update this issue when the helm chart is ready :)

jasondellaluce commented 1 year ago

Both v0.35.1 and the new charts should now be available!

Andreagit97 commented 1 year ago

@fcrespofastly Falco 0.35.1 is out! Please note the new field driver.modern_bpf.leastPrivileged=true that allows you to enable the leastPrivileged mode

fcrespofastly commented 1 year ago

Great thanks!!

fcrespofastly commented 1 year ago

@Andreagit97 I just tested it and it works like a charm! Thank you for the hard and good work!!

Andreagit97 commented 1 year ago

hey folks can we close this one :)?

jemag commented 1 year ago

tested with the latest chart version and driver.modern_bpf.leastPrivileged=true and works great so far on the AzureLinux AKS cluster. Will likely test on the Ubuntu cluster as well very soon.

tspearconquest commented 1 year ago

Thanks @jemag! Would you mind confirming the Azure Linux working over on https://github.com/falcosecurity/falco/issues/2673 as well - that should allow the team to close out that issue. :)

jemag commented 1 year ago

Works great on the Ubuntu AKS cluster as well. Replied to the other issue about AzureLinux @tspearconquest

Feel free to close this one

Andreagit97 commented 1 year ago

thank you for the feedback!

falcosecurity / falco

Cannot run falco "Least Privileged", pmu_fd: operation not permitted #2487