DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.88k stars 1.21k forks source link

Admission controller violates Kubernetes baseline PodSecurityStandard #1480 #28274

Closed matt-matt-tmatt closed 3 months ago

matt-matt-tmatt commented 3 months ago

Describe what happened:

When Datadog admission controller is enabled, pods are created with a hostpath volume which violates Kubernetes baseline PodSecurityStandard.

All back to normal when the admission controller is disabled.

❯ kubectl get events --sort-by='.lastTimestamp'
...
40m         Warning   FailedCreate        replicaset/redacted      Error creating: pods "redacted" is forbidden: violates PodSecurity "baseline:latest": hostPath volumes (volume "datadog")
...

Additional environment details (Operating System, Cloud provider, etc):

Server Version: v1.28.11-eks-db838b0

❯ k get po datadog-95n7g -o jsonpath={..image}
eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2 eu.gcr.io/datadoghq/agent:7.55.2%

❯ helm list
NAME    NAMESPACE   REVISION    UPDATED                                 STATUS      CHART           APP VERSION
datadog datadog     10          2024-08-06 16:30:53.922109 +0300 EEST   deployed    datadog-3.69.3  7
tbavelier commented 3 months ago

Hello @matt-matt-tmatt , this is documented in https://docs.datadoghq.com/containers/troubleshooting/admission-controller/?tab=helm#application-pods-are-not-created, not specifically for the baseline PSP, but for restricted environments such as OpenShift or GKE Autopilot. The socket mode which is default (https://github.com/DataDog/helm-charts/blob/b949cce9a3fac04c1af9ca3c1a9dd7bcdfae8a22/charts/datadog/values.yaml#L1105-L1111) due to the default APM settings makes use of a hostPath volume for the socket. You can switch the mode to service / hostip to avoid the mutation of your applicative pods with this socket volume and thus ensure they remain in line with the baseline PSP, but you will lose the advantages of socket for the transport.

matt-matt-tmatt commented 3 months ago

@tbavelier thanks. what would be the benefits of socket vs hostip?

I have it working now. I had to explicitly disable APM socket mode, even though I had TCP port enabled:

datadog:
  apm:
    portEnabled: true
    socketEnabled: false
tbavelier commented 3 months ago

You're welcome!

You can refer to https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=host (which is for DogstatsD), but it applies similarly for APM:

  • Bypassing the networking stack brings a significant performance improvement for high traffic.
  • DogStatsD can detect the container from which metrics originated and tag those metrics accordingly.

Feel free to reach out on https://www.datadoghq.com/support/ for more information on that, but unfortunately, it requires a shared volume on each node, between the Agent pod and the application pods, thus the hostPath solution.

I have it working now. I had to explicitly disable APM socket mode, even though I had TCP port enabled:

This is explained in the Helm chart I linked in my previous reply:

If clusterAgent.admissionController.configMode is not set:

* and datadog.apm.socketEnabled is true, the Admission Controller uses socket.

* and datadog.apm.portEnabled is true, the Admission Controller uses hostip.

* Otherwise, the Admission Controller defaults to hostip.

Note: "service" mode relies on the internal traffic service to target the agent running on the local node (requires Kubernetes v1.22+).

ref: https://docs.datadoghq.com/agent/cluster_agent/admission_controller/#configure-apm-and-dogstatsd-communication-mode

configMode: # "hostip", "socket" or "service"

Closing this issue as the behaviour is expected.