DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
337 stars 1.01k forks source link

appamor denied ptrace for both agent and process-agent #389

Open damonmaria opened 2 years ago

damonmaria commented 2 years ago

Describe what happened: I am getting a lot of the following logs:

audit: type=1400 audit(1631836990.273:48016): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=3026069 comm="process-agent" requested_mask="read" denied_mask="read" peer="unconfined"
audit: type=1400 audit(1631836899.434:47974): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=3025917 comm="agent" requested_mask="read" denied_mask="read" peer="unconfined"

This appears to be similar to https://github.com/DataDog/datadog-agent/issues/6915 which has this suggested workaround: https://github.com/DataDog/datadog-agent/issues/6915#issuecomment-748077547

I tried setting agents.podSecurity.apparmor.enabled: false but that did not change anything.

Describe what you expected: I expected either the defaults for the helm chart would not trigger an apparmor denial or I could configure apparmor unconfined for the agent and process-agent. But looking through the charts it does not seem to be possible. The system-probe on the other hand does have an option for this.

Also, this is the most recent changelog entry referring to apparmor but it says unconfied is the default for all agents: https://github.com/DataDog/helm-charts/blob/main/charts/datadog/CHANGELOG.md#2611

Steps to reproduce the issue: My values.yaml:

datadog:
  apiKey: ****
  appKey: ****
  clusterName: ****
  tags:
  - customer:****
  - site:****
  - env:commissioning
  - proj:****
  - dd_agent_platform:container
  kubelet:
    tlsVerify: false
  logs:
    enabled: true
    containerCollectAll: true
    containerCollectUsingFiles: true
  kubeStateMetricsCore:
    enabled: true
  dogstatsd:
    originDetection: true
    tagCardinality: orchestrator
    hostSocketPath: "/run/datadog"
    useHostPID: true
  processAgent:
    processCollection: true
  systemProbe:
    bpfDebug: true
    enableOOMKill: true
    enableTCPQueueLength: true
agents:
  volumes:
  - hostPath:
      path: "/etc/machine-id"
    name: machine-id
  volumeMounts:
  - name: machine-id
    mountPath: "/etc/machine-id"
    readOnly: true
clusterAgent:
  token: ******

Additional environment details (Operating System, Cloud provider, etc): OS: Ubuntu 20.04 Bare metal Datadog chart version: 2.22.2

L3n41c commented 2 years ago

Hello @damonmaria , Thanks for reaching out !

As you already noticed, this AppArmor message has already been investigated. The setting agents.podSecurity.apparmor.enabled controls whether AppArmor directives should be set in the PSP or not. The goal is to support both distributions that have AppArmor and the ones that don’t have it. Setting it to false will not “disable” AppArmor. The only thing that setting it to false will do is to ignore all AppArmor specific statements in all the manifests. This will result, on AppArmor-enabled distribution, on keeping the default setting, which is the docker-default AppArmor profile.

As you noticed, whereas there is a way to customise the AppArmor profile used by the system-probe container, there is no way to customise it for the agent or the process-agent. This is on purpose because only system-probe needs some extra-privileges to work properly.

Indeed, last time I investigated this issue, I couldn’t find any evidence that some features of the agent or process-agent were broken by this AppArmor denial.

For those who are really worried by this log, https://github.com/DataDog/datadog-agent/issues/6915#issuecomment-748077547 contains a custom AppArmor profile that can be used for the agent and process-agent. Could you please try it and let us know if it solves the AppArmor log in your case as well ? Note that, instead of setting this AppArmor profile in the DaemonSet as said in https://github.com/DataDog/datadog-agent/issues/6915#issuecomment-748077547, it may be wiser to rather edit the PSP and to set this new AppArmor profile in the apparmor.security.beta.kubernetes.io/defaultProfileName annotation.

It is unfortunately technically not possible to have this custom AppArmor profile be automatically installed by the datadog chart itself. Indeed, if we try to deploy this custom AppArmor profile in an init container of the DaemonSet, the pod cannot be deployed because Kubernetes checks the existence of the AppArmor profiles of all the containers (including agent and process-agent before executing the first init container.

Being able to use the unconfined AppArmor profile for the agent and process-agent doesn’t really makes sense because it is not needed. As said earlier, there’s no evidence that the agent is suffering from this denial. And people that care about AppArmor logs are people who are very cautious about security and uselessly granting access to the unconfined AppArmor profile to the agent and process-agent violates the “least privileges” principle.

damonmaria commented 2 years ago

Thanks @L3n41c for the very detailed response.

I presume to try this out it would require changing this template line from "runtime/default" to "unconfined"?

That has a bad smell to it and I wouldn't want to go there. Happy to take your word for it and that nothing is broken.

It is a bit annoying in that these 2 lines are number 4 and 5 in the top list of log patterns across all our hosts, and we haven't even converted all our fleet to using the DD K8s agent yet. So this is adding to our cost of using DD.

damonmaria commented 2 years ago

Just FYI @L3n41c. The log lines produced by this issue account for over half of our "log events" bill from Datadog. So while it does not cause a denial it is causing us a lot of cost.

L3n41c commented 2 years ago

Hello @damonmaria,

Thanks for the feedback !

I agree that the amount of logs can be problematic even if the functionalities of the agent are not affected. In order to ease the setup of the solution, I’ve added a new parameter agents.podSecurity.defaultApparmor to use to customise the default AppArmor profile used by the agent and process-agent in #432. As explained in my previous message, it’s unfortunately not technically possible to have the AppArmor profile be setup by an init container.

The new parameter can be set to unconfined, which should fix the log, but which would also weaken the security, or to the custom AppArmor profile setup based on those instructions.

damonmaria commented 2 years ago

@L3n41c Unfortunately I am unable to get rid of these apparmor messages even tho I'm pretty sure the PSP is setup correctly through the Datadog chart. Happy to open a new issue if you prefer.

Also, I am not sure if that I am not using Docker is the cause for me. We use k3s which is containerd based.

The PSP is definitely there and correct:

# helm get manifest -n monitoring  datadog
---
# Source: datadog/templates/agent-psp.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: datadog
  labels:
    helm.sh/chart: 'datadog-2.23.4'
    app.kubernetes.io/name: "datadog"
    app.kubernetes.io/instance: "datadog"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/version: "7"
  annotations:
    apparmor.security.beta.kubernetes.io/allowedProfileNames: "runtime/default,unconfined"
    apparmor.security.beta.kubernetes.io/defaultProfileName: unconfined
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: "runtime/default,localhost/system-probe"
    seccomp.security.alpha.kubernetes.io/defaultProfileName: "runtime/default"
spec:
  privileged: false
  hostNetwork: false
  hostPID: true
  allowedCapabilities:
    - SYS_ADMIN
    - SYS_RESOURCE
    - SYS_PTRACE
    - NET_ADMIN
    - NET_BROADCAST
    - NET_RAW
    - IPC_LOCK
    - AUDIT_CONTROL
    - AUDIT_READ
  volumes:
    - configMap
    - downwardAPI
    - emptyDir
    - hostPath
    - secret
  fsGroup:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: MustRunAs
    seLinuxOptions:
      level: s0
      role: system_r
      type: spc_t
      user: system_u
  supplementalGroups:
    rule: RunAsAny
---
...
---
# Source: datadog/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: datadog
  namespace: monitoring
  labels:
    helm.sh/chart: 'datadog-2.23.4'
    app.kubernetes.io/name: "datadog"
    app.kubernetes.io/instance: "datadog"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/version: "7"
    app.kubernetes.io/component: agent

spec:
  selector:
    matchLabels:
      app: datadog
  template:
    metadata:
      labels:
        app.kubernetes.io/name: "datadog"
        app.kubernetes.io/instance: "datadog"
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/component: agent
        app: datadog

      name: datadog
      annotations:
        checksum/clusteragent_token: 346d598c85a5a913fd3de3bdcf78ad82b1ad94e4b542fcd0fab0b0a29bc5c809
        checksum/api_key: 9646bdb35888f9083c6bcd3e21690f68d819a44e42ea48f3808a7369e8232c9e
        checksum/install_info: 3c95140ad694e8c4120a3f0430b5546adf7a16b694beedb46cde9937f51806d0
        checksum/autoconf-config: 74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b
        checksum/confd-config: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
        checksum/checksd-config: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
        container.apparmor.security.beta.kubernetes.io/system-probe: unconfined
        container.seccomp.security.alpha.kubernetes.io/system-probe: localhost/system-probe
    spec:
...

I'm not sure how the PSP is supposed to be applied to the daemonset.

The log messages I get come in 2 patterns (full event attributes from DD):

{
journald {   
_BOOT_ID | d176d050d026487491cd862f8eb23b03
_HOSTNAME | ******
_MACHINE_ID | ******
_SOURCE_MONOTONIC_TIMESTAMP | 1414702036954
_TRANSPORT | kernel
PRIORITY | 5
SYSLOG_FACILITY | 0
SYSLOG_IDENTIFIER | kernel
}
}
{
journald { 
_AUDIT_FIELD_APPARMOR | "DENIED"
_AUDIT_FIELD_DENIED_MASK | "read"
_AUDIT_FIELD_OPERATION | "ptrace"
_AUDIT_FIELD_PEER | "unconfined"
_AUDIT_FIELD_PROFILE | "cri-containerd.apparmor.d"
_AUDIT_FIELD_REQUESTED_MASK | "read"
_AUDIT_ID | 4528208
_AUDIT_TYPE | 1400
_AUDIT_TYPE_NAME | AVC
_BOOT_ID | d176d050d026487491cd862f8eb23b03
_COMM | process-agent
_HOSTNAME | ******
_MACHINE_ID | ******
_PID | 3698950
_SOURCE_REALTIME_TIMESTAMP | 1636062691587000
_TRANSPORT | audit
SYSLOG_FACILITY | 4
SYSLOG_IDENTIFIER | audit
}
}
damonmaria commented 2 years ago

@L3n41c I have tried a few other options but cannot get these log messages removed which are costing us a couple hundred US$ each month in DD fees.

Is it possibly because we use k3s/containerd and not docker? (as can be seen in _AUDIT_FIELD_PROFILE above)

You can see below that this option is getting all the way through into the k8s manifest. I am happy to put the time in to test any ideas you have that could fix this.

# kubectl describe -n monitoring PodSecurityPolicy/datadog
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Name:         datadog
Namespace:
Labels:       app.kubernetes.io/instance=datadog
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=datadog
              app.kubernetes.io/version=7
              helm.sh/chart=datadog-2.28.11
Annotations:  apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default,unconfined
              apparmor.security.beta.kubernetes.io/defaultProfileName: unconfined
              meta.helm.sh/release-name: datadog
              meta.helm.sh/release-namespace: monitoring
              seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default,localhost/system-probe
              seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
API Version:  policy/v1beta1
Kind:         PodSecurityPolicy
Metadata:
  Creation Timestamp:  2021-11-04T06:15:35Z
  Managed Fields:
    API Version:  policy/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:apparmor.security.beta.kubernetes.io/allowedProfileNames:
          f:apparmor.security.beta.kubernetes.io/defaultProfileName:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
          f:seccomp.security.alpha.kubernetes.io/allowedProfileNames:
          f:seccomp.security.alpha.kubernetes.io/defaultProfileName:
        f:labels:
          .:
          f:app.kubernetes.io/instance:
          f:app.kubernetes.io/managed-by:
          f:app.kubernetes.io/name:
          f:app.kubernetes.io/version:
          f:helm.sh/chart:
      f:spec:
        f:allowPrivilegeEscalation:
        f:allowedCapabilities:
        f:fsGroup:
          f:rule:
        f:hostPID:
        f:runAsUser:
          f:rule:
        f:seLinux:
          f:rule:
          f:seLinuxOptions:
            .:
            f:level:
            f:role:
            f:type:
            f:user:
        f:supplementalGroups:
          f:rule:
        f:volumes:
    Manager:         helm
    Operation:       Update
    Time:            2021-11-04T06:15:35Z
  Resource Version:  28657122
  UID:               7fd2e661-65b3-4499-a6d1-fde5465d1f72
Spec:
  Allow Privilege Escalation:  true
  Allowed Capabilities:
    SYS_ADMIN
    SYS_RESOURCE
    SYS_PTRACE
    NET_ADMIN
    NET_BROADCAST
    NET_RAW
    IPC_LOCK
    CHOWN
    AUDIT_CONTROL
    AUDIT_READ
  Fs Group:
    Rule:    RunAsAny
  Host PID:  true
  Run As User:
    Rule:  RunAsAny
  Se Linux:
    Rule:  MustRunAs
    Se Linux Options:
      Level:  s0
      Role:   system_r
      Type:   spc_t
      User:   system_u
  Supplemental Groups:
    Rule:  RunAsAny
  Volumes:
    configMap
    downwardAPI
    emptyDir
    hostPath
    secret
Events:  <none>
L3n41c commented 2 years ago

Hello @damonmaria,

Thanks for your feedback. As this issue is causing some billing troubles on your side, I think this issue is worth being reported as an official Datadog support case. Could you please attach a flare to the support case so that we can get some clues about the specificities of your environment? What would also be super valuable for us would be detailed instructions for a reproducer. You mentioned k3s, but the underlying OS on top of which k3s is installed may also be crucial for reproducing your issue.

damonmaria commented 2 years ago

@L3n41c Case ID: 627224

damonmaria commented 2 years ago

@L3n41c As part of the support process with DD I have found a patch that makes this work. I think this shows that the apparmor.security.beta.kubernetes.io/defaultProfileName annotation on the PSP is not applying a apparmor profile to the containers.

Adding the annotation container.apparmor.security.beta.kubernetes.io/process-agent: unconfined to the agent daemonset/pods stops the apparmor warnings. Here's what worked for me (until of course I release the helm chart again):

# cat patch-unconfined.yaml
spec:
        template:
                metadata:
                        annotations:
                                container.apparmor.security.beta.kubernetes.io/process-agent: unconfined
# kubectl patch daemonsets.apps -n monitoring datadog --patch-file patch-unconfined.yaml
daemonset.apps/datadog patched
# kubectl rollout restart -n monitoring daemonset datadog
daemonset.apps/datadog restarted
damonmaria commented 2 years ago

OK, thanks to Datadog support I have a viable workaround to this now. And it was pretty obvious in the end. Setting the following in the values.yaml allows you to apply a different apparmor profile to the process-agent container:

agents:
  podAnnotations:
    container.apparmor.security.beta.kubernetes.io/process-agent: 'unconfined'

As per the original post for this issue the agent container was also producing these warnings but that doesn't seem to be happening anymore.

NicklasWallgren commented 2 years ago

We have encountered this issue as well. The provided apparmor.d profile provided here https://github.com/DataDog/datadog-agent/issues/6915#issuecomment-748077547 didn't solve our issue.

We had to apply the following annotations

# agents.podAnnotations -- Annotations to add to the DaemonSet's Pods
  podAnnotations:
    container.apparmor.security.beta.kubernetes.io/agent: "unconfined"
    container.apparmor.security.beta.kubernetes.io/process-agent: "unconfined"