DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.78k stars 1.18k forks source link

SSL Certificate Verify failed while fetching kubernetes stats #2299

Open se0wtf opened 5 years ago

se0wtf commented 5 years ago

Hello, I'm testing datadog on a kubernetes cluster. -> no problem to fetch ES/Kafka metrics.

The kubelet api respond to https://192.168.110.50:6443 but the certificate is not valid.

Example with curl (from inside the datadog pod) :

$ curl -v https://192.168.110.50:6443
* Rebuilt URL to: https://192.168.110.50:6443/
*   Trying 192.168.110.50...
* TCP_NODELAY set
* Connected to 192.168.110.50 (192.168.110.50) port 6443 (#0)
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above

I need the k option to pass the SSL check

# curl -k https://192.168.110.50:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

This is my configuration for kubelet.d/conf.yaml

conf.yaml: |-
    init_config:
    instances:
    - kubelet_tls_verify: False
      kubelet_port: 6443

And this is the result from agent status :

kubelet (1.4.0)
    ---------------
      Total Runs: 68
      Metric Samples: 0, Total: 0
      Events: 0, Total: 0
      Service Checks: 0, Total: 0
      Average Execution Time : 13ms
      Error: Unable to detect the kubelet URL automatically.
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/base.py", line 303, in run
          self.check(copy.deepcopy(self.instances[0]))
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubelet/kubelet.py", line 86, in check
          raise CheckException("Unable to detect the kubelet URL automatically.")
      CheckException: Unable to detect the kubelet URL automatically.

This is my configuration for kubernetes_state.d/conf.yaml :

conf.yaml: |-
    init_config:
    instances:
    - kube_state_url: https://192.168.110.50:6443/metrics

And this is the result from agent status :

kubernetes_state (2.7.0)
    ------------------------
      Total Runs: 68
      Metric Samples: 0, Total: 0
      Events: 0, Total: 0
      Service Checks: 0, Total: 0
      Average Execution Time : 362ms
      Error: HTTPSConnectionPool(host='192.168.110.50', port=6443): Max retries exceeded with url: /metrics (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/base.py", line 303, in run
          self.check(copy.deepcopy(self.instances[0]))
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/kubernetes_state/kubernetes_state.py", line 202, in check
          self.process(endpoint, send_histograms_buckets=send_buckets, instance=instance)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py", line 388, in process
          for metric in self.scrape_metrics(endpoint):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py", line 352, in scrape_metrics
          response = self.poll(endpoint)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/prometheus/mixins.py", line 515, in poll
          response = requests.get(endpoint, headers=headers, stream=stream, timeout=self.prometheus_timeout, cert=cert, verify=verify)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/api.py", line 72, in get
          return request('get', url, params=params, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/api.py", line 58, in request
          return session.request(method=method, url=url, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/sessions.py", line 512, in request
          resp = self.send(prep, **send_kwargs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/sessions.py", line 622, in send
          r = adapter.send(request, **kwargs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/requests/adapters.py", line 511, in send
          raise SSLError(e, request=request)
      SSLError: HTTPSConnectionPool(host='192.168.110.50', port=6443): Max retries exceeded with url: /metrics (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))

So what do i need to do to skip the SSL verification for the metrics ? Thanks guys ;)

hextrim commented 4 years ago

Hi,

I got exactly the same problem across the board of k8s integration.

Is there a solution to this?

Thanks, Woj

huy-hoang-mox commented 3 years ago

I get the same issue when integrate it with prometheus:

                ad.datadoghq.com/openam.init_configs: [{}]
                ad.datadoghq.com/openam.instances:
                  [
                    {
                      "prometheus_url": "https://user:pass@%%host%%:8443/json/metrics/prometheus",
                      "namespace": "openam",
                      "metrics": ["*"],
                      "tls_verify": false,
                      "tls_ignore_warning": true
                    }
                  ]

When I tried to connect from datadog agent pod:

curl https://user:pass@10.8.117.251:8443/json/metrics/prometheus  -vvv
*   Trying 10.8.117.251:8443...
* TCP_NODELAY set
* Connected to 10.8.117.251 (10.8.117.251) port 8443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /opt/datadog-agent/embedded/ssl/certs/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: self signed certificate in certificate chain
* Closing connection 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
konstantin-lysenko-netapp commented 1 year ago

I've spent a few days debugging this issue and build the following configurations that work:

Helm configuration:

confd:
    openmetrics.yaml: |
      init_config:
      instances:
        - prometheus_url: "https://host:port/metrics"
          metrics:
            - "*"
          bearer_token_auth: true
          tls_verify: false

Pod annotations:

    ad.datadoghq.com/CONTAINER_NAME.check_names: |
      ["openmetrics"]
    ad.datadoghq.com/CONTAINER_NAME.init_configs: |
        [{}]
    ad.datadoghq.com/CONTAINER_NAME.instances: |
        [
            {
                "prometheus_url": "https://%%host%%:%%port%%/metrics",
                "namespace": "YOUR_NAMESPACE",
                "metrics": [
                    "*"
                ],
                "bearer_token_auth": true,
                "tls_verify": false
            }
        ]