cluster-agent insufficient permissions

parsley42 commented 4 months ago

Describe what happened: I deployed helm-charts 3.59.1 (most recent) with orchestratorExplorer.enabled: false (as I have been for more than a year). Now the cluster-agent pod won't start:

2024-03-21 14:51:03 UTC | CLUSTER | INFO | (pkg/api/healthprobe/healthprobe.go:75 in healthHandler) | Healthcheck failed on: [workloadmeta-kubeapiserver]
2024-03-21 14:51:06 UTC | CLUSTER | INFO | (pkg/api/healthprobe/healthprobe.go:75 in healthHandler) | Healthcheck failed on: [workloadmeta-kubeapiserver]
2024-03-21 14:51:18 UTC | CLUSTER | INFO | (pkg/api/healthprobe/healthprobe.go:75 in healthHandler) | Healthcheck failed on: [workloadmeta-kubeapiserver]
2024-03-21 14:51:20 UTC | CLUSTER | WARN | (client-go@v0.28.6/tools/cache/reflector.go:535 in list) | workloadmeta-kubeapiserver: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:datadog:datadog-cluster-agent" cannot list resource "deployments" in API group "apps" at the cluster scope
2024-03-21 14:51:20 UTC | CLUSTER | ERROR | (apimachinery@v0.28.6/pkg/util/runtime/runtime.go:109 in HandleError) | workloadmeta-kubeapiserver: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:datadog:datadog-cluster-agent" cannot list resource "deployments" in API group "apps" at the cluster scope

That just keeps repeating.

NOTE: I was able to FIX the issue (with a nasty hack) by modifying _helpers.tpl to force the permissions from "should-enable-k8s-resource-monitoring" (making it "true" for both cases):

{{- define "should-enable-k8s-resource-monitoring" -}}
{{- if and .Values.datadog.orchestratorExplorer.enabled (or .Values.clusterAgent.enabled (eq (include "existingClusterAgent-configured" .) "true")) -}}
true
{{- else -}}
true
{{- end -}}
{{- end -}}

Probably not the best/correct fix, but it worked.

Describe what you expected: cluster-agent should start as before

Steps to reproduce the issue: Deploy the latest helm chart, disabling orchestratorExplorer; for reference, here are my values overrides in the "Datadog" section:

datadog:
  apiKeyExistingSecret: datadog-api-secret
  apm:
    portEnabled: true
  containerExcludeMetrics: "name:.*"
  containerExcludeLogs: "name:.*"
  kubeStateMetricsCore:
    enabled: false
  tags:
  - "env:kubernetes"
  nodeLabelsAsTags:
    node.kubernetes.io/instance-type: aws-instance-type
    workload: workload
    topology.kubernetes.io/region: aws-region
    topology.kubernetes.io/zone: aws-zone
  logs:
    enabled: true
    containerCollectAll: true
  processAgent:
    enabled: false
  orchestratorExplorer:
    enabled: false
  ## Required for Bottlerocket; https://docs.datadoghq.com/agent/kubernetes/distributions?tab=helm#EKS
  criSocketPath: /run/dockershim.sock
  env:
  ## NOTE: Duplicated above for clusterAgent
  - name: DD_AUTOCONFIG_INCLUDE_FEATURES
    value: "containerd"

Additional environment details (Operating System, Cloud provider, etc): AWS EKS

celenechang commented 4 months ago

@parsley42 Thank you for reporting, and apologies for the issue. We will share an update here when a fixed chart is released

celenechang commented 4 months ago

@parsley42 chart 3.59.2 is now available which resolves the issue you experienced. Thanks again for reporting it

parsley42 commented 4 months ago

Wow, this does in fact fix it, thanks!

DataDog / helm-charts

cluster-agent insufficient permissions #1352