falcosecurity / plugins

Falco plugins registry
Apache License 2.0
82 stars 75 forks source link

k8saudit-eks - memory leak #525

Closed maxemontio closed 3 weeks ago

maxemontio commented 1 month ago

I have falco with k8saudit-eks addon deployed by helm chart. Very soon after the moment when the pod starts it is getting killed by OOM killer. The only thing i need is to get events from the cluster to parse them later, so there are no complicated rules - i print almost all available fields.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x7f1de81376b5]
image

Envirinment

Chart version: 4.7.0 App version: 0.38.1 Plugins: k8saudit-eks, json

Helm values

fullnameOverride: "falco"

falco:
  rules_file:
    - /etc/falco/rules.d

  plugins:
    - name: k8saudit-eks
      library_path: libk8saudit-eks.so
      init_config:
        region: "us-east-1"
        profile: "default"
        shift: 10
        polling_interval: 10
        use_async: false
        buffer_size: 500
      open_params: "<redacted>"
    - name: json
      library_path: libjson.so
      init_config: ""

  load_plugins: [k8saudit-eks, json]

  json_output: true
  json_include_output_property: true
  json_include_tags_property: true

  stdout_output:
    enabled: false
  http_output:
    enabled: true
    url: "http://<redacted>:8080"
    insecure: true

mounts:
  volumes:
    - name: rules
      configMap:
        name: rules
        items:
          - key: k8s_audit_rules.yaml
            path: k8s_audit_rules.yaml
  volumeMounts:
    - name: rules
      mountPath: /etc/falco/rules.d

falcoctl:
  indexes:
  - name: falcosecurity
    url: https://falcosecurity.github.io/falcoctl/index.yaml
  artifact:
    install:
      enabled: true
    follow:
      enabled: true
  config:
    artifact:
      allowedTypes:
        - plugin
        - rulesfile
      install:
        resolveDeps: false
        refs: [k8saudit-rules:0, k8saudit-eks:0, json:0]
      follow:
        refs: [k8saudit-rules:0]

resources:
  requests:
    cpu: null
    memory: 64Mi
  limits:
    cpu: null
    memory: 1024Mi

controller:
  kind: deployment
  deployment:
    replicas: 1

collectors:
  enabled: false

driver:
  enabled: false

serviceAccount:
  create: true
  annotations:
    "eks.amazonaws.com/role-arn": <redacted>
EOF

Rules:

- required_engine_version: 15
- required_plugin_versions:
  - name: k8saudit-eks
    version: 0.5.1

- rule: Dummy rule
  desc: >
    Dummy rule
  condition: >
    ka.verb exists
  output: >
    auditid=%ka.auditid stage=%ka.stage decision=%ka.auth.decision reason=%ka.auth.reason decision=%ka.auth.openshift.decision username=%ka.auth.openshift.username name=%ka.user.name groups=%ka.user.groups name=%ka.impuser.name verb=%ka.verb uri=%ka.uri name=%ka.target.name namespace=%ka.target.namespace resource=%ka.target.resource subresource=%ka.target.subresource name=%ka.target.pod.name subjects=%ka.req.binding.subjects role=%ka.req.binding.role name=%ka.req.configmap.name obj=%ka.req.configmap.obj image=%ka.req.pod.containers.image image=%ka.req.container.image repository=%ka.req.pod.containers.image.repository repository=%ka.req.container.image.repository host_ipc=%ka.req.pod.host_ipc host_network=%ka.req.pod.host_network host_network=%ka.req.container.host_network host_pid=%ka.req.pod.host_pid host_port=%ka.req.pod.containers.host_port privileged=%ka.req.pod.containers.privileged privileged=%ka.req.container.privileged allow_privilege_escalation=%ka.req.pod.containers.allow_privilege_escalation read_only_fs=%ka.req.pod.containers.read_only_fs run_as_user=%ka.req.pod.run_as_user run_as_user=%ka.req.pod.containers.run_as_user eff_run_as_user=%ka.req.pod.containers.eff_run_as_user run_as_group=%ka.req.pod.run_as_group run_as_group=%ka.req.pod.containers.run_as_group eff_run_as_group=%ka.req.pod.containers.eff_run_as_group proc_mount=%ka.req.pod.containers.proc_mount rules=%ka.req.role.rules apiGroups=%ka.req.role.rules.apiGroups nonResourceURLs=%ka.req.role.rules.nonResourceURLs verbs=%ka.req.role.rules.verbs resources=%ka.req.role.rules.resources fs_group=%ka.req.pod.fs_group supplemental_groups=%ka.req.pod.supplemental_groups add_capabilities=%ka.req.pod.containers.add_capabilities type=%ka.req.service.type ports=%ka.req.service.ports hostpath=%ka.req.pod.volumes.hostpath flexvolume_driver=%ka.req.pod.volumes.flexvolume_driver volume_type=%ka.req.pod.volumes.volume_type name=%ka.resp.name code=%ka.response.code reason=%ka.response.reason useragent=%ka.useragent sourceips=%ka.sourceips name=%ka.cluster.name
  priority: WARNING
  source: k8s_audit
  tags: [k8s]
Issif commented 3 weeks ago

Your rule is pretty invasive, it basically create an alert for every k8s audit event. If you disable it, do you have the same memory profile? Like that, it's hard to tell which part is the cause, it can be the plugin or the http output of Falco.

maxemontio commented 3 weeks ago

@Issif No, without the rule everything is fine. My goal is just to output all possible events. Maybe using falco only for that is a complete overkill?

Issif commented 3 weeks ago

Falco itself is a rule engine, the capture of the syscalls and their enrichment are made by the libs, and in your situation, the capture of the k8s audit logs from EKS are made by a plugin.

I don't see the added value of Falco if you want to collect all the k8s audit logs from Cloudwatch Logs, the rule engine becoming useless, and the output rate an issue (the tool is designed to trigger security alerts, they are not supposed to be fired dozens times per second, or your infra is highly compromised).

You can find the logic to pull the logs from Cloudwatch Logs used by our plugin here. I think it's pretty easy to get inspiration from it and to develop a little app in charge to get these logs, format them in the Falco payload format and then push to your http endpoint.

maxemontio commented 3 weeks ago

Thank you for your help, @Issif!