DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
939 stars 1.41k forks source link

Prometheus autodiscovery not using HTTPS #16751

Open Mstrodl opened 10 months ago

Mstrodl commented 10 months ago

Note: If you have a feature request, you should contact support so the request can be properly tracked.

Output of the info page

agent-status.txt

Additional environment details (Operating System, Cloud provider, etc): Using helm chart on OKD4:

Client Version: 4.10.0-0.okd-2022-05-07-021833
Server Version: 4.13.0-0.okd-2023-10-28-065448
Kubernetes Version: v1.26.4-3014+636f2be6157d45-dirty

relevant values.yaml snippet:

  ## Configure prometheus scraping autodiscovery

  ## ref: https://docs.datadoghq.com/agent/kubernetes/prometheus/
  prometheusScrape:
    # datadog.prometheusScrape.enabled -- Enable autodiscovering pods and services exposing prometheus metrics.
    enabled: true
    # datadog.prometheusScrape.serviceEndpoints -- Enable generating dedicated checks for service endpoints.
    serviceEndpoints: true
    # datadog.prometheusScrape.additionalConfigs -- Allows adding advanced openmetrics check configurations with custom discovery rules. (Requires Agent version 7.27+)
    additionalConfigs: []
      # -
      #   autodiscovery:
      #     kubernetes_annotations:
      #       include:
      #         custom_include_label: 'true'
      #       exclude:
      #         custom_exclude_label: 'true'
      #     kubernetes_container_names:
      #     - my-app
      #   configurations:
      #   - send_distribution_buckets: true
      #     timeout: 5
    # datadog.prometheusScrape.version -- Version of the openmetrics check to schedule by default.
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART          APP VERSION
datadog-agent   datadog-agent   37              2024-01-30 12:56:48.144866698 -0500 EST deployed        datadog-3.53.0 7

Steps to reproduce the issue:

  1. Enable prometheus autodiscovery on the agent helm chart
  2. Have https-only endpoints: (Note the 'scheme' annotation)
    labels:
    app: openshift-oauth-apiserver
    annotations:
    operator.openshift.io/spec-hash: 9c74227d7f96d723d980c50373a5e91f08c5893365bfd5a5040449b1b6585a23
    prometheus.io/scheme: https
    prometheus.io/scrape: 'true'
    service.alpha.openshift.io/serving-cert-secret-name: serving-cert
    service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1607974872
    service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1607974872

Describe the results you received: 400 errors from the server ("An http request was sent to an https port"):

      Error: There was an error scraping endpoint http://10.128.0.60:8443/metrics: 400 Client Error: Bad Request for url: http://10.128.0.60:8443/metrics
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/base.py", line 1235, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/checks/openmetrics/v2/base.py", line 78, in check
          raise_from(type(e)("There was an error scraping endpoint {}: {}".format(endpoint, e)), None)
        File "<string>", line 3, in raise_from
      requests.exceptions.HTTPError: There was an error scraping endpoint http://10.128.0.60:8443/metrics: 400 Client Error: Bad Request for url: http://10.128.0.60:8443/metrics

Describe the results you expected: The agent should obey the scheme annotation

Additional information you deem important (e.g. issue happens only occasionally): We only started seeing this recently. I think the upgrade to OKD4 4.13.0 is to blame?

FlorentClarret commented 10 months ago

Hello @Mstrodl 👋 Thanks for opening this issue.

Could you please open a support ticket for this? This will be easier to track on our side and to get further information 🙁 Thank you!