DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.87k stars 1.21k forks source link

[BUG] AKS MutatingWebhookConfiguration reconcoliation error #20503

Open Sh4kE opened 1 year ago

Sh4kE commented 1 year ago

Agent Environment

AKS (v1.27.3) Datadog Agent installed via helm chart from https://helm.datadoghq.com (3.42.1), which installs cluster-agent image 7.48.1 Describe what happened:

I installed the chart with the following config:

  providers:
    aks:
      enabled: true

And I'm still seeing the error message: Couldn't reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io "datadog-webhook": the object has been modified; please apply your changes to the latest version and try again

This should have been fixed by #10413, but it does not work here.

I found the following azure docs:

When I manually edit the mutatingwebhookconfiguration to include the annotation admissions.enforcer/disabled: "true" then the logs won't get spammed anymore.

Describe what you expected:

No logs concerning webhook reconciliation.

Steps to reproduce the issue: Install the latest chart in the latest aks version

Additional environment details (Operating System, Cloud provider, etc): AKS in Azure

fei819 commented 12 months ago

I have the same issue. but looks like the fix from https://github.com/DataDog/datadog-agent/pull/19965 will be included in 7.49.0 (confirm fix with 7.49.0-rc.7 image)

oscarlgz commented 10 months ago

I'm still having this issue with chart version 3.49.6

fei819 commented 10 months ago

do you enable ask flag as well, such as providers: aks: enabled: true

oscarlgz commented 10 months ago

I do indeed, these are my values:

targetSystem: "linux"
datadog:
  tags:
    - "env:{{ .Values.projectName }}"
  kubelet:
    tlsVerify: false
  site: "datadoghq.eu"
  apiKeyExistingSecret: "datadog-secret"
  appKeyExistingSecret: "datadog-secret"
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    enabled: true
  secretBackend:
    command: "/readsecret_multiple_providers.sh"
providers:
  aks:
    enabled: true
clusterAgent:
  enabled: true
  confd:
    postgres.yaml: |
      cluster_check: true
      init_config:
      instances:
        - dbm: true
          host: {{ .Values.hume.datadog.postgres.fqdn }}
          port: 5432
          username: "datadog@{{ .Values.hume.datadog.postgres.fqdn }}"
          password: 'ENC[k8s_secret@hume/1pass-secrets/POSTGRES_DATADOG_PASSWORD]'
          ssl: true
          azure:
            deployment_type: single_server
            fully_qualified_domain_name: {{ .Values.hume.datadog.postgres.fqdn }}
clusterChecksRunner:
  enabled: true

I don't believe I had this issue until recently. Logging still works but it's quite annoying to have these constantly restarting pods

azk1el commented 10 months ago

Hi, I've also had that problem that it constantly restarting. Using datadog 7.49.1 helm deployed via ArgoCD.(autosync on). every commit to branch that I've deployed Datadog from causes autosync and that restarts even if manifest not changed. Changed deployment to it's own branch and get rid off problem with constant restarting (was cause by every commit).

webhook error disappear after +- 1 hour of pod life-cycle

Don't know if it will help you anyway :(