cilium (1.8.0) - Connection refused

sebastienfi commented 2 years ago

I honestly don't know if this is a bug or a documentation issue.

Output of the info page (if this is a bug)

➜ kubectl exec -it datadog-agent-66m4f agent status                         
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "agent" out of: agent, trace-agent, process-agent, init-volume (init), init-config (init)
2021-11-17 10:23:13 UTC | CORE | WARN | (pkg/util/log/log.go:630 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively
2021-11-17 10:23:13 UTC | CORE | INFO | (cmd/system-probe/config/config.go:119 in Merge) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
Getting the status from the agent.

===============
Agent (v7.32.0)
===============

  Status date: 2021-11-17 10:23:13.112 UTC (1637144593112)
  Agent start: 2021-11-17 10:06:59.757 UTC (1637143619757)
  Pid: 1
  Go Version: go1.16.7
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -14µs
    System time: 2021-11-17 10:23:13.112 UTC (1637144593112)

  Host Info
  =========
    bootTime: 2021-09-14 14:04:29 UTC (1631628269000)
    kernelArch: x86_64
    kernelVersion: 5.8.0-45-generic
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.04
    procs: 191
    uptime: 1532h2m43s
    virtualizationRole: guest
    virtualizationSystem: kvm

  Hostnames
  =========
    host_aliases: [scw-k8s-acme-preprod-pool-influxdb-6e916d37-k8s-acme-preprod]
    hostname: scw-k8s-acme-preprod-pool-influxdb-6e916d37
    socket-fqdn: datadog-agent-66m4f
    socket-hostname: datadog-agent-66m4f
    host tags:
      cluster_name:k8s-acme-preprod
      kube_cluster_name:k8s-acme-preprod
    hostname provider: container
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

  Metadata
  ========
    hostname_source: container

=========
Collector
=========

  Running Checks
  ==============

    cilium (1.8.0)
    --------------
      Instance ID: cilium:d708a04a4231d526 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/cilium.d/auto_conf.yaml
      Total Runs: 65
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 65
      Average Execution Time : 20ms
      Last Execution Date : 2021-11-17 10:23:09 UTC (1637144589000)
      Last Successful Execution Date : Never
      Error: HTTPConnectionPool(host='10.70.46.127', port=9090): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa27a3259d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
          conn = connection.create_connection(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
          raise err
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
          sock.connect(sa)
      ConnectionRefusedError: [Errno 111] Connection refused

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
          httplib_response = self._make_request(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
          conn.request(method, url, **httplib_request_kw)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 239, in request
          super(HTTPConnection, self).request(method, url, body=body, headers=headers)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1256, in request
          self._send_request(method, url, body, headers, encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1302, in _send_request
          self.endheaders(body, encode_chunked=encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1251, in endheaders
          self._send_output(message_body, encode_chunked=encode_chunked)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 1011, in _send_output
          self.send(msg)
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 951, in send
          self.connect()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 205, in connect
          conn = self._new_conn()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
          raise NewConnectionError(
      urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fa27a3259d0>: Failed to establish a new connection: [Errno 111] Connection refused

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
          resp = conn.urlopen(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
          retries = retries.increment(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py", line 574, in increment
          raise MaxRetryError(_pool, url, error or ResponseError(cause))
      urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.70.46.127', port=9090): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa27a3259d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 1017, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/base_check.py", line 136, in check
          self.process(scraper_config)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 532, in process
          for metric in self.scrape_metrics(scraper_config):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 469, in scrape_metrics
          response = self.poll(scraper_config)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 779, in poll
          response = self.send_request(endpoint, scraper_config, headers)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/openmetrics/mixins.py", line 805, in send_request
          return http_handler.get(endpoint, stream=True, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 341, in get
          return self._request('get', url, options)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 405, in _request
          response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py", line 411, in make_request_aia_chasing
          response = request_method(url, **new_options)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 76, in get
          return request('get', url, params=params, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py", line 61, in request
          return session.request(method=method, url=url, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
          resp = self.send(prep, **send_kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
          r = adapter.send(request, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
          raise ConnectionError(e, request=request)
      requests.exceptions.ConnectionError: HTTPConnectionPool(host='10.70.46.127', port=9090): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa27a3259d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 9, Total: 569
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-11-17 10:23:01 UTC (1637144581000)
      Last Successful Execution Date : 2021-11-17 10:23:01 UTC (1637144581000)

    disk (4.4.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 384, Total: 24,576
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 98ms
      Last Execution Date : 2021-11-17 10:23:08 UTC (1637144588000)
      Last Successful Execution Date : 2021-11-17 10:23:08 UTC (1637144588000)

    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 519, Total: 33,216
      Events: Last Run: 2, Total: 2
      Service Checks: Last Run: 1, Total: 64
      Average Execution Time : 123ms
      Last Execution Date : 2021-11-17 10:23:00 UTC (1637144580000)
      Last Successful Execution Date : 2021-11-17 10:23:00 UTC (1637144580000)

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 5, Total: 320
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-11-17 10:23:07 UTC (1637144587000)
      Last Successful Execution Date : 2021-11-17 10:23:07 UTC (1637144587000)

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 91, Total: 5,761
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-11-17 10:22:59 UTC (1637144579000)
      Last Successful Execution Date : 2021-11-17 10:22:59 UTC (1637144579000)

    kubelet (7.1.0)
    ---------------
      Instance ID: kubelet:5bbc63f3938c02f4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 49
      Metric Samples: Last Run: 855, Total: 41,340
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 196
      Average Execution Time : 677ms
      Last Execution Date : 2021-11-17 10:23:10 UTC (1637144590000)
      Last Successful Execution Date : 2021-11-17 10:23:10 UTC (1637144590000)

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 6, Total: 384
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-11-17 10:23:06 UTC (1637144586000)
      Last Successful Execution Date : 2021-11-17 10:23:06 UTC (1637144586000)

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 18, Total: 1,152
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-11-17 10:22:58 UTC (1637144578000)
      Last Successful Execution Date : 2021-11-17 10:22:58 UTC (1637144578000)

    network (2.4.0)
    ---------------
      Instance ID: network:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 64
      Metric Samples: Last Run: 115, Total: 7,360
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 11ms
      Last Execution Date : 2021-11-17 10:23:05 UTC (1637144585000)
      Last Successful Execution Date : 2021-11-17 10:23:05 UTC (1637144585000)

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 1, Total: 2
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 2
      Average Execution Time : 5.039s
      Last Execution Date : 2021-11-17 10:22:14 UTC (1637144534000)
      Last Successful Execution Date : 2021-11-17 10:22:14 UTC (1637144534000)

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 65
      Metric Samples: Last Run: 1, Total: 65
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2021-11-17 10:23:12 UTC (1637144592000)
      Last Successful Execution Date : 2021-11-17 10:23:12 UTC (1637144592000)

========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 136
    Successes By Endpoint:
      check_run_v1: 64
      intake: 8
      series_v1: 64

  API Keys status
  ===============
    API key ending with 8999f: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.eu - API Key ending with:
      - 8999f

==========
Logs Agent
==========

    Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.eu on port 443
    BytesSent: 2.427446e+06
    EncodedBytesSent: 346112
    LogsProcessed: 1331
    LogsSent: 1330

  nuclio/mongodb-business-1/mongodb
  ---------------------------------
    - Type: file
      Identifier: 48f0c1e2a0d0147ecce8b4ae5a919098baca399c3deb60092106bc66c5c6445d
      Path: /var/log/pods/nuclio_mongodb-business-1_59dbec96-816b-4225-a666-4089974c1bea/mongodb/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/nuclio_mongodb-business-1_59dbec96-816b-4225-a666-4089974c1bea/mongodb/1.log
        /var/log/pods/nuclio_mongodb-business-1_59dbec96-816b-4225-a666-4089974c1bea/mongodb/0.log
      BytesRead: 237489
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 1
      24h Peak Latency (ms): 1

  nuclio/mongo-express-deployment-779878c9ff-fx4zc/mongo-express
  --------------------------------------------------------------
    - Type: file
      Identifier: 2e738d2047eeac8e76158a71e37e6f5eb70ce7b9d235e00f3e11551f596ebf78
      Path: /var/log/pods/nuclio_mongo-express-deployment-779878c9ff-fx4zc_f6a27399-1acb-487d-97ae-05645a6fe7c3/mongo-express/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/nuclio_mongo-express-deployment-779878c9ff-fx4zc_f6a27399-1acb-487d-97ae-05645a6fe7c3/mongo-express/0.log
        /var/log/pods/nuclio_mongo-express-deployment-779878c9ff-fx4zc_f6a27399-1acb-487d-97ae-05645a6fe7c3/mongo-express/1.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  container_collect_all
  ---------------------
    - Type: docker
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  default/datadog-agent-66m4f/init-config
  ---------------------------------------
    - Type: file
      Identifier: f98593016f0d4caf2c2e9375d903378db0a6cfeedc4aa6f9071776242b048009
      Path: /var/log/pods/default_datadog-agent-66m4f_99505fd9-d7f3-4ff7-b91e-fd3a0007bdcc/init-config/*.log
      Status: Pending
        1 files tailed out of 1 files matching
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  default/influxdb-0/influxdb
  ---------------------------
    - Type: file
      Identifier: f3418db5b23a33995458944813fb6eb025ac3687f2f4eef6d40f9266e5ffdc0b
      Path: /var/log/pods/default_influxdb-0_56be3653-c89f-4e8c-832c-50475a99b063/influxdb/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/default_influxdb-0_56be3653-c89f-4e8c-832c-50475a99b063/influxdb/1.log
        /var/log/pods/default_influxdb-0_56be3653-c89f-4e8c-832c-50475a99b063/influxdb/0.log
      BytesRead: 58277
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 7
      24h Peak Latency (ms): 7

  default/influxdb-relay-7474689b4d-6nnc7/relay
  ---------------------------------------------
    - Type: file
      Identifier: d02d4abeb3e0fbcd31acf1cf581f9c232b6cc90f991885629594e11a53364a90
      Path: /var/log/pods/default_influxdb-relay-7474689b4d-6nnc7_3dfe7892-dd34-49e7-8c02-c4a0dcde7e2f/relay/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/default_influxdb-relay-7474689b4d-6nnc7_3dfe7892-dd34-49e7-8c02-c4a0dcde7e2f/relay/1.log
        /var/log/pods/default_influxdb-relay-7474689b4d-6nnc7_3dfe7892-dd34-49e7-8c02-c4a0dcde7e2f/relay/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  lens-metrics/node-exporter-fdr5b/node-exporter
  ----------------------------------------------
    - Type: file
      Identifier: 8a52c4297d19979a0620ffa16bc13ff9cf4fdb29e332aec644dae7a4798f57d4
      Path: /var/log/pods/lens-metrics_node-exporter-fdr5b_b22ca249-b96e-40dc-890a-b7f510b38712/node-exporter/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/lens-metrics_node-exporter-fdr5b_b22ca249-b96e-40dc-890a-b7f510b38712/node-exporter/1.log
        /var/log/pods/lens-metrics_node-exporter-fdr5b_b22ca249-b96e-40dc-890a-b7f510b38712/node-exporter/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/kube-proxy-2h2h8/kube-proxy
  ---------------------------------------
    - Type: file
      Identifier: 92ba3a89360f2dcc5eac0a3d7c92da305420ceef1a68f19c6adc3a4b384d2dc2
      Path: /var/log/pods/kube-system_kube-proxy-2h2h8_73e51c50-d32c-401c-be5f-5ed8c5346f40/kube-proxy/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/kube-system_kube-proxy-2h2h8_73e51c50-d32c-401c-be5f-5ed8c5346f40/kube-proxy/1.log
        /var/log/pods/kube-system_kube-proxy-2h2h8_73e51c50-d32c-401c-be5f-5ed8c5346f40/kube-proxy/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  default/datadog-agent-66m4f/agent
  ---------------------------------
    - Type: file
      Identifier: a636b67e32ff8fc86b7300de9f54fe88beba14c4554d316eb1ee68e9382bfc7e
      Path: /var/log/pods/default_datadog-agent-66m4f_99505fd9-d7f3-4ff7-b91e-fd3a0007bdcc/agent/*.log
      Status: Pending
        1 files tailed out of 1 files matching
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/csi-node-lgv2r/csi-plugin
  -------------------------------------
    - Type: file
      Identifier: 1284d19aacea45bdc316e3b56ab4043b05ad50fff5294738fcdbd33b16ab7c1e
      Path: /var/log/pods/kube-system_csi-node-lgv2r_70821448-1e46-4d2f-983d-793d5e5c1052/csi-plugin/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/kube-system_csi-node-lgv2r_70821448-1e46-4d2f-983d-793d5e5c1052/csi-plugin/1.log
        /var/log/pods/kube-system_csi-node-lgv2r_70821448-1e46-4d2f-983d-793d5e5c1052/csi-plugin/0.log
      BytesRead: 8883
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/node-problem-detector-rkshf/node-problem-detector
  -------------------------------------------------------------
    - Type: file
      Identifier: d24f1d19cf80a8f23afe0ec1dc8a283f5b261594e33be1d761dcb190889c972a
      Path: /var/log/pods/kube-system_node-problem-detector-rkshf_a5c652c4-2c60-4c6f-a600-96fc528435ba/node-problem-detector/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/kube-system_node-problem-detector-rkshf_a5c652c4-2c60-4c6f-a600-96fc528435ba/node-problem-detector/1.log
        /var/log/pods/kube-system_node-problem-detector-rkshf_a5c652c4-2c60-4c6f-a600-96fc528435ba/node-problem-detector/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  default/datadog-agent-66m4f/process-agent
  -----------------------------------------
    - Type: file
      Identifier: ccf0137ff6d267e8336e21bbae2e16f7a72631ccd6fc9eea21bca12c54741a05
      Path: /var/log/pods/default_datadog-agent-66m4f_99505fd9-d7f3-4ff7-b91e-fd3a0007bdcc/process-agent/*.log
      Status: Pending
        1 files tailed out of 1 files matching
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/nginx-ingress-v6t62/nginx-ingress-controller
  --------------------------------------------------------
    - Type: file
      Identifier: 7315eefeb0a9bda4e86033447a43fd675b0b9d81226c2c6115f7c5ce228e5a46
      Path: /var/log/pods/kube-system_nginx-ingress-v6t62_a60fa13e-7565-43cc-9e2c-946a49be9fe7/nginx-ingress-controller/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/kube-system_nginx-ingress-v6t62_a60fa13e-7565-43cc-9e2c-946a49be9fe7/nginx-ingress-controller/1.log
        /var/log/pods/kube-system_nginx-ingress-v6t62_a60fa13e-7565-43cc-9e2c-946a49be9fe7/nginx-ingress-controller/0.log
      BytesRead: 9701
      Average Latency (ms): 2
      24h Average Latency (ms): 2
      Peak Latency (ms): 73
      24h Peak Latency (ms): 73

  default/datadog-agent-66m4f/init-volume
  ---------------------------------------
    - Type: file
      Identifier: cdd955946645654331ba2c2c4ea34fda796667fefdcf394f1cd49f74e2bc3ee4
      Path: /var/log/pods/default_datadog-agent-66m4f_99505fd9-d7f3-4ff7-b91e-fd3a0007bdcc/init-volume/*.log
      Status: Pending
        1 files tailed out of 1 files matching
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  default/datadog-agent-66m4f/trace-agent
  ---------------------------------------
    - Type: file
      Identifier: 612bc83b8c07a2fbf2586c99b644d92587f7e91a3548f4e4723701864ca97f4a
      Path: /var/log/pods/default_datadog-agent-66m4f_99505fd9-d7f3-4ff7-b91e-fd3a0007bdcc/trace-agent/*.log
      Status: Pending
        1 files tailed out of 1 files matching
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  nuclio/nuclio-dashboard-54997897df-4x6dz/nuclio-dashboard
  ---------------------------------------------------------
    - Type: file
      Identifier: 775ca5605353d89ad2e55966b91440d45a9f925ad6ae18aa4195aa280f1ca5ce
      Path: /var/log/pods/nuclio_nuclio-dashboard-54997897df-4x6dz_7cc13eab-04e7-4701-9a35-c5ecc8fb5532/nuclio-dashboard/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/nuclio_nuclio-dashboard-54997897df-4x6dz_7cc13eab-04e7-4701-9a35-c5ecc8fb5532/nuclio-dashboard/1.log
        /var/log/pods/nuclio_nuclio-dashboard-54997897df-4x6dz_7cc13eab-04e7-4701-9a35-c5ecc8fb5532/nuclio-dashboard/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/csi-node-lgv2r/csi-node-driver-registrar
  ----------------------------------------------------
    - Type: file
      Identifier: 56a93159d391b7a91ea78469c514798195f07fbc24172633c7d92788edf5e056
      Path: /var/log/pods/kube-system_csi-node-lgv2r_70821448-1e46-4d2f-983d-793d5e5c1052/csi-node-driver-registrar/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/kube-system_csi-node-lgv2r_70821448-1e46-4d2f-983d-793d5e5c1052/csi-node-driver-registrar/1.log
        /var/log/pods/kube-system_csi-node-lgv2r_70821448-1e46-4d2f-983d-793d5e5c1052/csi-node-driver-registrar/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/cilium-m7bcm/cilium-agent
  -------------------------------------
    - Type: file
      Identifier: c376b254560b9e04f72a30c8e777b4d7fd07cc0477cdc1c3b57e1f27a8d84140
      Path: /var/log/pods/kube-system_cilium-m7bcm_a09646ab-ac53-44e2-aeae-4c535ca2734b/cilium-agent/*.log
      Status: OK
        2 files tailed out of 2 files matching
      Inputs:
        /var/log/pods/kube-system_cilium-m7bcm_a09646ab-ac53-44e2-aeae-4c535ca2734b/cilium-agent/1.log
        /var/log/pods/kube-system_cilium-m7bcm_a09646ab-ac53-44e2-aeae-4c535ca2734b/cilium-agent/0.log
      BytesRead: 5303
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 1
      24h Peak Latency (ms): 1

=========
APM Agent
=========
  Status: Running
  Pid: 1
  Uptime: 972 seconds
  Mem alloc: 9,910,976 bytes
  Hostname: scw-k8s-acme-preprod-pool-influxdb-6e916d37
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.eu

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 116,131
  Dogstatsd Metric Sample: 7,838
  Event: 3
  Events Flushed: 3
  Number Of Flushes: 64
  Series Flushed: 98,272
  Service Check: 1,020
  Service Checks Flushed: 1,072
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 7,837
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 597,795
  Udp Packet Reading Errors: 0
  Udp Packets: 5,985
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 1
  Unterminated Metric Errors: 0

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://10.32.124.26:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 1.16.0+commit.9961689

=============
Autodiscovery
=============
  Enabled Features
  ================
    docker
    kubernetes

Describe what happened:

After installation in Kubernetes host, an error message appears as an integration issue.

Describe what you expected:

I expected this error not to happen, or an error analysis which would give clues on how to correct this which looks like misconfiguration.

Steps to reproduce the issue:

Unknown.

Additional environment details (Operating System, Cloud provider, etc):

Cloud provider: Scaleway

PLATFORM: GNU/Linux Hostname, datadog-agent-wx99b OS, GNU/Linux Kernel Name, Linux Processor, x86_64 Kernel Release, 5.4.0-80-generic Kernel Version, DataDog/datadog-agent#90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 Machine, x86_64 Hardware Platform, x86_64

yzhan289 commented 2 years ago

Hi 👋 , we will need some more information to pinpoint the issue that you are facing. Please submit a support ticket instead.

dllegru commented 2 years ago

I'm having the same exact issue.

Running datadog helm chart v2.26.2 in a GKE Dataplane V2.

Pinpointing the error in our side, it seems the datadog agent is trying to scrape the metrics from cilium operator on port 9090, by default port by the cilium operator is 6942. Haven't seen a way to change the scraping target for cilium operator in the chart, can you please indicate which is the best way to proceed?

NaxAlpha commented 2 years ago

I am also facing the same issue. I am using Google Cloud Run for Anthos on GKE with Dataplane V2 enabled.

kkirpichnikov commented 2 years ago

@sebastienfi were you able to resolve this issue? Is there any work around? cc: @yzhan289 We are facing the same issue. Chart: datadog-2.31.0 Agent: v7.34.0 cilium: 1.10.2 Dataplane V2 is enabled.

yzhan289 commented 2 years ago

Hi @dllegru @NaxAlpha @kkirpichnikov , please file a support case for this so we can collect more information on why this is happening for you.

kkirpichnikov commented 2 years ago

Thanks, will do.

NaurisSadovskis commented 2 years ago

hey folks, any update?

kkirpichnikov commented 2 years ago

@NaurisSadovskis in my case they said that I have to do this in my values.yaml file and it helped.

datadog:
  env:
    - name: DD_IGNORE_AUTOCONF
      value: "cilium"

NaurisSadovskis commented 2 years ago

Thanks @kkirpichnikov. For those using the Helm chart, the value is:

datadog:
  ignoreAutoConfig:
  - cilium

argais commented 1 year ago

Did anybody reach a solution that is not ignoring cilium? Support tickets are nice and all, but having it documented here would save us from opening yet another support ticket.

argais commented 1 year ago

The solution is actually pretty simple, adding it here for the next poor soul struggling with this.

You can customize the check. https://app.datadoghq.com/integrations/cilium?search=cil has the details, but for me, using the K8s operator deployed with helm, the solution was simply to add annotations to the pods by adding the following to my helm values:

        podAnnotations:
          ad.datadoghq.com/cilium-agent.checks: |
            {
              "cilium": {
                "init_config": {},
                "instances": [{"agent_endpoint": "http://%%host%%:9962/metrics","use_openmetrics": "true"}]
              }
            }
          ad.datadoghq.com/cilium-operator.checks: |
            {
              "cilium": {
                "init_config": {},
                "instances": [{"operator_endpoint": "http://%%host%%:9963/metrics","use_openmetrics": "true"}]
              }
            }

sp-albert-jubany commented 12 months ago

Worked perfectly, thank you @argais

brettcurtis commented 11 months ago

This fails on GKE as well. Working through a ticket now (1319416) and we were able to get metrics by adding the following to the helm values file:

datadog:
  confd:
    cilium.yaml: |-
      ad_identifiers:
        - cilium
      init_config:
      instances:
        - agent_endpoint: http://%%host%%:9990/metrics
          use_openmetrics: true

However the auto config is still puking from the agent side:

    cilium (2.4.0)
    --------------
      Instance ID: cilium:42432dabab946ec3 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/cilium.d/auto_conf.yaml
      Total Runs: 23
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 23
      Average Execution Time : 87ms
      Last Execution Date : 2023-08-25 11:41:24 UTC (1692963684000)
      Last Successful Execution Date : Never
      Error: HTTPConnectionPool(host='10.60.80.22', port=9962): Max retries exceeded with url: /metrics (Caused by NewCo
nnectionError('<urllib3.connection.HTTPConnection object at 0x7cf4aae82f10>: Failed to establish a new connection: [Errn
o 111] Connection refused'))
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
          conn = connection.create_connection(
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_co

The configuration source for some reason is not using the new /etc/datadog-agent/conf.d/cilium.yaml. Still feels like something is off but again we are seeing metrics now. That said to clear up the above error we also added:

datadog:
  ignoreAutoConfig:
  - cilium

That fix the agent status:

    cilium (2.4.0)
    --------------
      Instance ID: cilium:a9c4ff2d2d8bb9ae [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cilium.yaml
      Total Runs: 20
      Metric Samples: Last Run: 1,857, Total: 37,140
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 20
      Average Execution Time : 923ms
      Last Execution Date : 2023-08-25 12:02:47 UTC (1692964967000)
      Last Successful Execution Date : 2023-08-25 12:02:47 UTC (1692964967000)

Also I'm not 100% sure but I think you need to enable GKE Dataplane V2 metrics on your cluster.

sourcec0de commented 2 months ago

For those who land here in the future. Be careful, if the integration isn't working correctly, Datadog sees these as "custom" metrics, and you're in for a nasty surprise bill.

I hope someone at Datadog will consider reopening this issue and ensure this is applied correctly when targeting a GKE autopilot cluster, as it causes a significant billing discrepancy. The helm chart should handle this more gracefully.

DataDog / integrations-core

cilium (1.8.0) - Connection refused #10669