DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
346 stars 1.01k forks source link

Datadog Cluster Agent tries to chmod /etc/datadog-agent/ on OpenShift since PR #1096 - not enough security context #1162

Open erikhjensen opened 1 year ago

erikhjensen commented 1 year ago

Describe what happened:

we are on Azure Red Hat OpenShift Kubernetes

since https://github.com/DataDog/helm-charts/pull/1096 it seems that the initContainer tries to change the permissions of its subfolder /etc/datadog-agent/ as per this config

`initContainers:

we then changed the values to allow it to deploy its own SCC hoping that this would resolve the cluster-agent's permissions and it doesn't appear to be enough. The user that runs the initContainer can't perform that chmod and subsequently, in agent status, we get errors I'll show below.

Init Volume Logs

chmod: changing permissions of '/etc/datadog-agent': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/compliance.d': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/compliance.d/audit-rule-common.rego': Operation not permitted ... lines omitted ... chmod: changing permissions of '/etc/datadog-agent/conf.d': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/conf.d/kubernetes_apiserver.d': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/conf.d/orchestrator.d': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/datadog-cluster.yaml': Operation not permitted chmod: changing permissions of '/etc/datadog-agent/install_info': Operation not permitted

Cluster Agent Status

Config Errors

kubernetes_apiserver ... removed horizontal rule markup ... open /etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default: permission denied orchestrator ... removed horizontal rule markup ... open /etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default: permission denied

so what we see is the cluster agent pod does run, but it's in a faulted status. the agent pods from the daemonset can connect to it but it doesn't seem to be able to perform its function to connect to the cluster and explore the orchestrator.

On the Infrastructure kubernetes module of the UI, no cluster is shown, no nodes are shown.. pods do show.. so only partial resource discovery looks to be happening.

Describe what you expected:

the initContainer has a sufficient securityContext to perform its required initialization and then the cluster agent is able to use that configuration.

Steps to reproduce the issue:

Updated from helm chart 3.10.1

Additional environment details (Operating System, Cloud provider, etc):

Azure Red Hat OpenShift SCC's deployed for both agent and cluster agent SCCs do show service account in the users list.

erikhjensen commented 1 year ago

Upon upgrading to helm chart release 3.35.0 the cluster agent works again and does connect to the orchestrator. The "operation not permitted" errors remain in the initContainer/init-volume logs. Wondering if we need those chmod and cp commands anymore. In any case, let's keep this open as this does seem like an unintended side-effect and I saw that @clamoriniere had put in an PR to the agent team to have the container come w/ some more friendly permissions built-in so I'm thinking that now that 7.47 is released, you may want to revisit the initContainer's bash commands and assess if they're no longer required.