DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
928 stars 1.4k forks source link

Autodiscovery of Kubernetes control-plane components not working on GKE Enterprise (Anthos) #18816

Open sebastien-prudhomme opened 1 week ago

sebastien-prudhomme commented 1 week ago

I'm installing Datadog agent on GKE Enterprise (Anthos), where access to control-plane nodes is possible.

All container images used for the control-plane are suffixed with "-amd64" by Google, suffix not present in the "ad_identifiers" of the different integrations:

For now my workaround is to overide the default configuration of these integrations in the Helm chart values:

  confd:
    etcd.yaml: |
      ad_identifiers:
        - etcd
        - etcd-amd64
      instances:
        - prometheus_url: http://localhost:2379/metrics
          possible_prometheus_urls:
            - https://%%host%%:2379/metrics
            - http://%%host%%:2379/metrics
          ssl_verify: false
    kube_apiserver_metrics.yaml: |
      ad_identifiers:
        - kube-apiserver
        - kube-apiserver-amd64
      instances:
        - possible_prometheus_urls:
            - https://%%host%%:6443/metrics
            - https://%%host%%:8443/metrics
          bearer_token_auth: tls_only
          tags:
            - apiserver:%%host%%
    kube_controller_manager.yaml: |
      ad_identifiers:
        - kube-controller-manager
        - kube-controller-manager-amd64
      instances:
        - possible_prometheus_urls:
            - https://%%host%%:10257/metrics
            - https://localhost:10257/metrics
            - http://%%host%%:10252/metrics
            - http://localhost:10252/metrics
          bearer_token_auth: tls_only
          ssl_verify: false
    kube_scheduler.yaml: |
      ad_identifiers:
        - kube-scheduler
        - kube-scheduler-amd64
      instances:
        - possible_prometheus_urls:
            - https://%%host%%:10259/metrics
            - https://localhost:10259/metrics
            - http://%%host%%:10251/metrics
            - http://localhost:10251/metrics
          bearer_token_auth: tls_only
          ssl_verify: false
sebastien-prudhomme commented 5 days ago

There is also a problem in the corresponding overview dashboards because some widgest are using the "short_image" field in the query, for instance "query": "sum:kubernetes.memory.usage{$cluster,$scope,short_image:kube-scheduler} by {pod_name}"