DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
347 stars 1.02k forks source link

Agent getting 403 error when trying to connect to kubelet #543

Open andrewlchiang opened 2 years ago

andrewlchiang commented 2 years ago

Hi, thanks for taking a look! I believe there's an RBAC issue here and would appreciate any advice.

Output of the info page (if this is a bug)

===============
Agent (v7.33.0)
===============

  Status date: 2022-02-10 18:23:16.534 UTC (1644517396534)
  Agent start: 2022-02-10 00:11:36.865 UTC (1644451896865)
  Pid: 1
  Go Version: go1.16.7
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System time: 2022-02-10 18:23:16.534 UTC (1644517396534)

  Host Info
  =========
    bootTime: 2022-02-09 19:00:12 UTC (1644433212000)
    kernelArch: x86_64
    kernelVersion: 3.10.0-1160.15.2.el7.mcp.x86_64
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.10
    procs: 701
    uptime: 5h11m38s

  Hostnames
  =========
    host_aliases: REDACTED
    hostname: REDACTED
    socket-fqdn: REDACTED
    socket-hostname: REDACTED
    host tags:
      kube_node_role:control-plane
      kube_node_role:master
    hostname provider: container

  Metadata
  ========
    agent_version: 7.33.0
    config_apm_dd_url:
    config_dd_url:
    config_logs_dd_url:
    config_logs_socks5_proxy_address:
    config_no_proxy: []
    config_process_dd_url:
    config_proxy_http: http://10.109.142.145:9881
    config_proxy_https: http://10.109.142.145:9881
    config_site:
    feature_apm_enabled: false
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_logs_enabled: true
    feature_networks_enabled: false
    feature_process_enabled: false
    flavor: agent
    hostname_source: container
    install_method_installer_version: datadog-2.30.5
    install_method_tool: helm
    install_method_tool_version: Helm
    logs_transport: HTTP

=========
Collector
=========

  Running Checks
  ==============

    kubelet (7.1.0)
    ---------------
      Instance ID: kubelet:5bbc63f3938c02f4 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 3,275
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 3,275
      Average Execution Time : 743ms
      Last Execution Date : 2022-02-10 18:23:09 UTC (1644517389000)
      Last Successful Execution Date : Never
      Error: HTTPSConnectionPool(host='172.16.128.1', port=10250): Max retries exceeded with url: /spec?verbose=True (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 700, in urlopen
          self._prepare_proxy(conn)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py", line 994, in _prepare_proxy
          conn.connect()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py", line 371, in connect
          self._tunnel()
        File "/opt/datadog-agent/embedded/lib/python3.8/http/client.py", line 905, in _tunnel
          raise OSError("Tunnel connection failed: %d %s" % (code,
      OSError: Tunnel connection failed: 403 Forbidden

Describe what happened: I've installed Datadog on a single node k3s cluster via Helm and am trying to get Datadog fully operational, but am getting a 403 error when trying to connect to kubelet. I've been seeing these three warnings/errors repeatedly when looking at the logs of the agent container. Looks like a similar issue to https://github.com/DataDog/datadog-agent/issues/6621.

2022-02-10 19:25:08 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | - | (kubelet.py:425) | kubelet check https://172.16.128.1:10250/healthz failed: HTTPSConnectionPool(host='172.16.128.1', port=10250): Max retries exceeded with url: /healthz?verbose=True (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))
2022-02-10 19:25:08 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | - | (base.py:59) | failed to retrieve pod list from the kubelet at https://172.16.128.1:10250/pods : HTTPSConnectionPool(host='172.16.128.1', port=10250): Max retries exceeded with url: /pods?verbose=True (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))
2022-02-10 19:25:08 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:68 in Error) | check:kubelet | Error running check: [{"message": "HTTPSConnectionPool(host='172.16.128.1', port=10250): Max retries exceeded with url: /spec?verbose=True (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 700, in urlopen\n    self._prepare_proxy(conn)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 994, in _prepare_proxy\n    conn.connect()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 371, in connect\n    self._tunnel()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 905, in _tunnel\n    raise OSError(\"Tunnel connection failed: %d %s\" % (code,\nOSError: Tunnel connection failed: 403 Forbidden\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 439, in send\n    resp = conn.urlopen(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 785, in urlopen\n    retries = retries.increment(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py\", line 592, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.16.128.1', port=10250): Max retries exceeded with url: /spec?verbose=True (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 1017, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 336, in check\n    self._report_node_metrics(self.instance_tags)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 382, in _report_node_metrics\n    node_resp = self._retrieve_node_spec()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 365, in _retrieve_node_spec\n    node_resp = self.perform_kubelet_query(self.node_spec_url)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/kubelet_base/base.py\", line 31, in perform_kubelet_query\n    return self.http.get(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 341, in get\n    return self._request('get', url, options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 405, in _request\n    response = self.make_request_aia_chasing(request_method, method, url, new_options, persist)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 411, in make_request_aia_chasing\n    response = request_method(url, **new_options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 76, in get\n    return request('get', url, params=params, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 542, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 655, in send\n    r = adapter.send(request, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 510, in send\n    raise ProxyError(e, request=request)\nrequests.exceptions.ProxyError: HTTPSConnectionPool(host='172.16.128.1', port=10250): Max retries exceeded with url: /spec?verbose=True (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))\n"}]

Execing into the agent container and running TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token) && curl https://$DD_KUBERNETES_KUBELET_HOST:10250/pods -v -k -H "Authorization: Bearer $TOKEN" and TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token) && curl https://$DD_KUBERNETES_KUBELET_HOST:10250/pods -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $TOKEN" return no errors, so I think this is an RBAC issue.

Describe what you expected: I expected the pod to run without errors and be able to reach the kubelet.

Steps to reproduce the issue: Attached is the ClusterRole and ClusterRoleBinding.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    objectset.rio.cattle.io/id: REDACTED
  labels:
    app.kubernetes.io/instance: REDACTED
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: REDACTED
    app.kubernetes.io/version: "7"
    helm.sh/chart: datadog-2.30.5
    objectset.rio.cattle.io/hash: 918a69d017b4b9f22d7a60323fd2257513cb1a05
  name: REDACTED
rules:
- apiGroups:
  - ""
  resources:
  - services
  - events
  - endpoints
  - pods
  - nodes
  - namespaces
  - componentstatuses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - quota.openshift.io
  resources:
  - clusterresourcequotas
  verbs:
  - get
  - list
- apiGroups:
  - ""
  resourceNames:
  - datadogtoken
  resources:
  - configmaps
  verbs:
  - get
  - update
- apiGroups:
  - ""
  resourceNames:
  - datadog-leader-election
  resources:
  - configmaps
  verbs:
  - get
  - update
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
- nonResourceURLs:
  - /version
  - /healthz
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - nodes/spec
  - nodes/proxy
  - nodes/stats
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - endpoints
  verbs:
  - get
- apiGroups:
  - policy
  resourceNames:
  - REDACTED
  resources:
  - podsecuritypolicies
  verbs:
  - use
- apiGroups:
  - security.openshift.io
  resourceNames:
  - REDACTED
  - hostaccess
  - privileged
  resources:
  - securitycontextconstraints
  verbs:
  - use
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    objectset.rio.cattle.io/id: REDACTED
  labels:
    app.kubernetes.io/instance: REDACTED
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: REDACTED
    app.kubernetes.io/version: "7"
    helm.sh/chart: datadog-2.30.5
    objectset.rio.cattle.io/hash: 918a69d017b4b9f22d7a60323fd2257513cb1a05
  name: REDACTED
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name:REDACTED
subjects:
- kind: ServiceAccount
  name: REDACTED
  namespace: REDACTED

Additional environment details (Operating System, Cloud provider, etc): k3s version v1.21.6+k3s1 (df033fa2)

clamoriniere commented 2 years ago

Hi @andrewlchiang

IMO it is due to the http/https proxy that you have configured please check this doc: https://docs.datadoghq.com/agent/proxy/?tab=agentv6v7#web-proxy