Admission controller not starting when Custom Metrics Server is enabled

Output of the info page (if this is a bug)

# k exec -it -n datadog staging-datadog-cluster-agent-659cbd8445-ntptx -- datadog-cluster-agent status > datadog-cluster-agent.txt

Getting the status from the agent.
2022-02-03 00:48:49 UTC | CLUSTER | WARN | (pkg/util/log/log.go:640 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-02-03 00:48:49 UTC | CLUSTER | INFO | (pkg/util/log/log.go:620 in func1) | Features detected from environment: kubernetes,orchestratorexplorer
===============================
Datadog Cluster Agent (v1.17.0)
===============================

  Status date: 2022-02-03 00:48:49.564 UTC (1643849329564)
  Agent start: 2022-02-03 00:44:28.242 UTC (1643849068242)
  Pid: 1
  Go Version: go1.16.7
  Build arch: amd64
  Agent flavor: cluster_agent
  Check Runners: 4
  Log Level: DEBUG

  Paths
  =====
    Config File: /etc/datadog-agent/datadog-cluster.yaml
    conf.d: /etc/datadog-agent/conf.d

  Clocks
  ======
    System time: 2022-02-03 00:48:49.564 UTC (1643849329564)

  Hostnames
  =========
    ec2-hostname: ip-10-0-151-155.ec2.internal
    hostname: i-00e5b28cd0a622650
    instance-id: i-00e5b28cd0a622650
    socket-fqdn: 10-0-151-155.staging-datadog-cluster-agent.datadog.svc.cluster.local.
    socket-hostname: ip-10-0-151-155.ec2.internal
    hostname provider: aws
    unused hostname providers:
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      container: Unable to get hostname from container API
      gce: unable to retrieve hostname from GCE: GCE metadata API error: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

  Metadata
  ========

Leader Election
===============
  Leader Election Status:  Running
  Leader Name is: ip-10-0-151-155.ec2.internal
  Last Acquisition of the lease: Thu, 03 Feb 2022 00:04:58 UTC
  Renewed leadership: Thu, 03 Feb 2022 00:48:45 UTC
  Number of leader transitions: 17 transitions

Custom Metrics Server
=====================
  ConfigMap name: datadog/datadog-custom-metrics
  External Metrics
  ----------------
    Total: 0
    Valid: 0

Cluster Checks Dispatching
==========================
  Status: Leader, serving requests
  Active agents: 3
  Check Configurations: 0
    - Dispatched: 0
    - Unassigned: 0

Admission Controller
====================

    Webhooks info
    -------------
      MutatingWebhookConfigurations name: datadog-webhook
      Created at: 2022-02-03T00:03:32Z
      ---------
        Name: datadog.webhook.config
        CA bundle digest: 76d3b6955aea8c93
        Object selector: &LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:admission.datadoghq.com/enabled,Operator:NotIn,Values:[false],},},}
        Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
        Service: datadog/staging-datadog-cluster-agent-admission-controller - Port: 443 - Path: /injectconfig
      ---------
        Name: datadog.webhook.tags
        CA bundle digest: 76d3b6955aea8c93
        Object selector: &LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:admission.datadoghq.com/enabled,Operator:NotIn,Values:[false],},},}
        Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
        Service: datadog/staging-datadog-cluster-agent-admission-controller - Port: 443 - Path: /injecttags

    Secret info
    -----------
    Secret name: webhook-certificate
    Secret namespace: datadog
    Created at: 2022-02-02T18:02:47Z
    CA bundle digest: 76d3b6955aea8c93
    Duration before certificate expiration: 8753h13m57.411973539s

=========
Collector
=========

  Running Checks
  ==============

    kubernetes_apiserver
    --------------------
      Instance ID: kubernetes_apiserver [[32mOK[0m]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
      Total Runs: 18
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 1, Total: 23
      Service Checks: Last Run: 5, Total: 90
      Average Execution Time : 1.983s
      Last Execution Date : 2022-02-03 00:48:48 UTC (1643849328000)
      Last Successful Execution Date : 2022-02-03 00:48:48 UTC (1643849328000)

    orchestrator
    ------------
      Instance ID: orchestrator:d884b5186b651429 [[32mOK[0m]
      Configuration Source: file:/etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default
      Total Runs: 26
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 43ms
      Last Execution Date : 2022-02-03 00:48:41 UTC (1643849321000)
      Last Successful Execution Date : 2022-02-03 00:48:41 UTC (1643849321000)

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 2
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 2
    DaemonSet: 5
    Deployment: 6
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 5
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 5
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 2
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 65
    Successes By Endpoint:
      check_run_v1: 17
      intake: 4
      orchestrator: 27
      series_v1: 17

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 38515

=====================
Orchestrator Explorer
=====================
  Collection Status: The collection is at least partially running since the cache has been populated.
  Cluster Name: staging
  Cluster ID: 958dc9c9-153f-4e69-9943-6149973b8745
  Container scrubbing: enabled

  ======================
  Orchestrator Endpoints
  ======================
    https://orchestrator.datadoghq.com - API Key ending with: 38515

  ===========
  Cache Stats
  ===========
    Elements in the cache: 71

    Cluster
      Last Run: (Hits: 1 Miss: 0) | Total: (Hits: 24 Miss: 2)

    CronJob
      Last Run: (Hits: 1 Miss: 0) | Total: (Hits: 24 Miss: 2)

    DaemonSet
      Last Run: (Hits: 4 Miss: 0) | Total: (Hits: 94 Miss: 10)

    Deployment
      Last Run: (Hits: 15 Miss: 0) | Total: (Hits: 358 Miss: 32)

    Job
      Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)

    Node
      Last Run: (Hits: 3 Miss: 0) | Total: (Hits: 70 Miss: 8)

    Pod
      Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)

    ReplicaSet
      Last Run: (Hits: 15 Miss: 0) | Total: (Hits: 358 Miss: 32)

    Service
      Last Run: (Hits: 21 Miss: 0) | Total: (Hits: 504 Miss: 42)

    StatefulSet
      Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 0 Miss: 0)

Describe what happened: When deploying Datadog with the latest helm chart and having both admission controller and metrics provider enabled, there was an error in the log and the admission controller was not created. After disabling the metrics provider, and trying again, the error is gone and the admission controller is created. This error seems similar to https://github.com/DataDog/datadog-agent/pull/10171 - we are using Datadog Cluster Agent (v1.17.0) which seems to have this fix - If not please discard this issue, we will wait for next release.

Describe what you expected: admission controller should be configured even if there was an error with metrics provider.

We could actually see the metrics provider, so it's unclear if this error was blocking

kubectl get apiservice v1beta1.external.metrics.k8s.io
NAME                              SERVICE                                             AVAILABLE   AGE
v1beta1.external.metrics.k8s.io   datadog/staging-datadog-cluster-agent-metrics-api   True        5m46s

Steps to reproduce the issue:

# values.yaml
datadog:
  logLevel: DEBUG
clusterAgent:
  enabled: true
  image:
    name: cluster-agent
    tag: 1.17.0
  admissionController:
    enabled: true
    mutateUnlabelled: true
  metricsProvider:
    enabled: true

helm install staging-datadog -f values.yaml version 2.30.4  -n datadog  datadog/datadog

# this was not created
kubectl get mutatingwebhookconfiguration datadog-webhook
Error from server (NotFound): mutatingwebhookconfigurations.admissionregistration.k8s.io "datadog-webhook" not found

try again after disabling the metricsProvider, the error is gone and the mutatingwebhookconfiguration is created

  metricsProvider:
    enabled: false #change this

Additional environment details (Operating System, Cloud provider, etc):

kubectl version --short=true
Server Version: v1.18.20-eks-8c49e2

cluster agent logs

2022-02-02 22:02:09 UTC | CLUSTER | WARN | (pkg/util/log/log.go:640 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/util/log/log.go:620 in func1) | Features detected from environment: kubernetes,orchestratorexplorer
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (app/app.go:186 in start) | Health check listening on port 5556
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (api/v1/install.go:31 in InstallMetadataEndpoints) | Registering metadata endpoints
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/api/security/security.go:145 in fetchAuthToken) | Saved a new authentication token to /etc/datadog-agent/auth_token
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/api/security/security.go:190 in getClusterAgentAuthToken) | Using configured cluster_agent.auth_token
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (app/app.go:196 in start) | Waiting to obtain APIClient connection
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/apiserver.go:374 in connect) | Connected to kubernetes apiserver, version v1
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/apiserver.go:380 in connect) | Could successfully collect Pods, Nodes, Services and Events
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (app/app.go:201 in start) | Got APIClient connection
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:176 in GetHostnameData) | Unable to get the hostname from the config file: hostname is empty
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:199 in GetHostnameData) | Trying to determine a reliable host name automatically...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:208 in GetHostnameData) | GetHostname trying GCE metadata...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/config/config.go:1393 in IsCloudProviderEnabled) | cloud_provider_metadata is set to [aws gcp azure alibaba] in agent configuration, trying endpoints for GCP Cloud Provider
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:218 in GetHostnameData) | Unable to get hostname from GCE:  unable to retrieve hostname from GCE: GCE metadata API error: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:225 in GetHostnameData) | GetHostname trying FQDN/`hostname -f`...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:236 in GetHostnameData) | Unable to get FQDN from system:  <nil>
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname/providers.go:38 in GetHostname) | GetHostname trying provider 'kube_apiserver' ...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname_docker.go:27 in getContainerHostname) | could not fetch the host nodename from the apiserver: pods "ip-10-0-151-155.ec2.internal" not found
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname_docker.go:44 in getContainerHostname) | hostname provider kubelet not found
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:254 in GetHostnameData) | GetHostname trying os...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:273 in GetHostnameData) | GetHostname trying EC2 metadata...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/ec2/ec2.go:348 in HostnameProvider) | GetHostname trying EC2 metadata...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/config/config.go:1393 in IsCloudProviderEnabled) | cloud_provider_metadata is set to [aws gcp azure alibaba] in agent configuration, trying endpoints for AWS Cloud Provider
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:311 in GetHostnameData) | GetHostname trying Azure metadata...
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/hostname.go:321 in GetHostnameData) | unable to get hostname from Azure: azure_hostname_style is set to 'os'
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (app/app.go:208 in start) | Hostname is: i-00e5b28cd0a622650
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/forwarder/forwarder.go:226 in NewDefaultForwarder) | Retry queue storage on disk is disabled
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/forwarder/forwarder.go:329 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://1-17-0-app.agent.datadoghq.com" (1 api key(s))
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/util/kubernetes/clustername/clustername.go:75 in getClusterName) | Got cluster name staging from config
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/forwarder/forwarder.go:226 in NewDefaultForwarder) | Retry queue storage on disk is disabled
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/forwarder/forwarder.go:329 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://orchestrator.datadoghq.com" (1 api key(s))
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection.go:124 in init) | Init LeaderEngine with HolderIdentity: "ip-10-0-151-155.ec2.internal"
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection.go:132 in init) | LeaderLeaseDuration: 1m0s
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection_engine.go:74 in newElection) | Current registered leader is "ip-10-0-143-132.ec2.internal", building leader elector "ip-10-0-151-155.ec2.internal" as candidate
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/leaderelection/leaderelection.go:155 in init) | Leader Engine for "ip-10-0-151-155.ec2.internal" successfully initialized
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/util/kubernetes/apiserver/metadata_controller.go:80 in Run) | Starting metadata controller
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/util/kubernetes/autoscalers/datadogexternal.go:213 in NewDatadogClient) | Initialized the Datadog Client for HPA with endpoint "https://api.datadoghq.com"
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/clusteragent/custommetrics/store_configmap.go:56 in NewConfigMapStore) | Retrieved the configmap datadog-custom-metrics
2022-02-02 22:02:09 UTC | CLUSTER | INFO | (pkg/util/kubernetes/apiserver/hpa_controller.go:81 in RunHPA) | Starting HPA Controller ... 
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:120 in addNode) | Detected node ip-10-0-132-147.ec2.internal
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:120 in addNode) | Detected node ip-10-0-143-132.ec2.internal
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:120 in addNode) | Detected node ip-10-0-151-155.ec2.internal
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints default/kubernetes
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints kube-system/kube-scheduler
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-proxy-injector
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints ingress-nginx/ingress-nginx-controller
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints datadog/staging-datadog-cluster-agent
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints datadog/staging-datadog-cluster-agent-metrics-api
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints kube-system/kube-dns
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints autoscaler/cluster-autoscaler-aws-cluster-autoscaler
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints cert-manager/cert-manager
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-dst-headless
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-controller-api
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-dst
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-sp-validator
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints ingress-nginx/opta-ingress-healthcheck
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints kube-system/kube-controller-manager
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-identity-headless
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints linkerd/linkerd-identity
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints ingress-nginx/ingress-nginx-controller-admission
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints hellodd/hellodd
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints datadog/staging-datadog-kube-state-metrics
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints kube-system/metrics-server
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints kube-system/aws-load-balancer-webhook-service
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints cert-manager/cert-manager-webhook
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/metadata_controller.go:148 in addEndpoints) | Adding endpoints datadog/staging-datadog-cluster-agent-admission-controller
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/util.go:41 in func1) | Sync done for informer services in 100.496688ms, last resource version: 284537
2022-02-02 22:02:09 UTC | CLUSTER | DEBUG | (pkg/util/kubernetes/apiserver/util.go:41 in func1) | Sync done for informer endpoints in 100.595664ms, last resource version: 284637
2022-02-02 22:02:10 UTC | CLUSTER | ERROR | (app/app.go:292 in start) | Could not start admission controller: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

DataDog / datadog-agent

Admission controller not starting when Custom Metrics Server is enabled #10764