DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
913 stars 1.39k forks source link

Broken cert expiration check for http_check with tls_verify disabled #11595

Closed theduderog closed 2 years ago

theduderog commented 2 years ago
===============
Agent (v6.33.0)
===============

  Status date: 2022-03-01 19:02:19.055 UTC (1646161339055)
  Agent start: 2022-03-01 18:17:47.42 UTC (1646158667420)
  Pid: 411
  Go Version: go1.16.7
  Python Version: 2.7.18
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 5
  Log File: /mnt/log/agent.log
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -116µs
    System time: 2022-03-01 19:02:19.055 UTC (1646161339055)

  Host Info
  =========
    bootTime: 2021-05-24 13:16:00 UTC (1621862160000)
    kernelArch: x86_64
    kernelVersion: 4.14.231-173.361.amzn2.x86_64
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.10
    procs: 167
    uptime: 6749h1m55s

  Hostnames
  =========
    cluster-name: k8s-test
    ec2-hostname: ip-10-3-38-250.us-west-2.compute.internal
    host_aliases: [ip-10-3-38-250.us-west-2.compute.internal-k8s-test]
    hostname: i-04c046d812156653x
    instance-id: i-04c046d812156653x
    socket-fqdn: ip-10-3-38-250.us-west-2.compute.internal.
    socket-hostname: ip-10-3-38-250.us-west-2.compute.internal
    host tags:
      cloud:AWS
      <redacted>
    hostname provider: aws
    unused hostname providers:
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: GCE metadata API error: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

  Metadata
  ========
    agent_version: 6.33.0
    cloud_provider: AWS
    config_apm_dd_url: 
    config_dd_url: 
    config_logs_dd_url: 
    config_logs_socks5_proxy_address: 
    config_no_proxy: []
    config_process_dd_url: 
    config_proxy_http: 
    config_proxy_https: 
    config_site: 
    feature_apm_enabled: true
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_logs_enabled: false
    feature_networks_enabled: false
    feature_process_enabled: true
    flavor: agent
    hostname_source: aws
    install_method_installer_version: docker
    install_method_tool: docker
    install_method_tool_version: docker

=========
Collector
=========

  Running Checks
  ==============

    container
    ---------
      Instance ID: container [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/container.d/conf.yaml.default
      Total Runs: 178
      Metric Samples: Last Run: 231, Total: 41,118
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 6ms
      Last Execution Date : 2022-03-01 19:02:11 UTC (1646161331000)
      Last Successful Execution Date : 2022-03-01 19:02:11 UTC (1646161331000)

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 178
      Metric Samples: Last Run: 9, Total: 1,595
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:18 UTC (1646161338000)
      Last Successful Execution Date : 2022-03-01 19:02:18 UTC (1646161338000)

    disk (4.5.1)
    ------------
      Instance ID: disk:b2cf39b497091ec9 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/disk.yaml
      Total Runs: 178
      Metric Samples: Last Run: 84, Total: 14,952
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 67ms
      Last Execution Date : 2022-03-01 19:02:10 UTC (1646161330000)
      Last Successful Execution Date : 2022-03-01 19:02:10 UTC (1646161330000)

    dns_check (2.1.0)
    -----------------
      Instance ID: dns_check:google-check-via-kube-dns:478dd776df194c16 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/dns_check.d/conf.yaml
      Total Runs: 178
      Metric Samples: Last Run: 1, Total: 178
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 178
      Average Execution Time : 1ms
      Last Execution Date : 2022-03-01 19:02:17 UTC (1646161337000)
      Last Successful Execution Date : 2022-03-01 19:02:17 UTC (1646161337000)

      Instance ID: dns_check:google-check-via-public-cloudflare-dns:7018628659fe2e44 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/dns_check.d/conf.yaml
      Total Runs: 178
      Metric Samples: Last Run: 1, Total: 178
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 178
      Average Execution Time : 9ms
      Last Execution Date : 2022-03-01 19:02:16 UTC (1646161336000)
      Last Successful Execution Date : 2022-03-01 19:02:16 UTC (1646161336000)

      Instance ID: dns_check:google-check-via-public-google-dns:f6b991be82834544 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/dns_check.d/conf.yaml
      Total Runs: 177
      Metric Samples: Last Run: 1, Total: 177
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 177
      Average Execution Time : 8ms
      Last Execution Date : 2022-03-01 19:02:08 UTC (1646161328000)
      Last Successful Execution Date : 2022-03-01 19:02:08 UTC (1646161328000)

      Instance ID: dns_check:k8s-default-via-kube-dns:51ca8513f09e2997 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/dns_check.d/conf.yaml
      Total Runs: 177
      Metric Samples: Last Run: 1, Total: 177
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 177
      Average Execution Time : 3ms
      Last Execution Date : 2022-03-01 19:02:09 UTC (1646161329000)
      Last Successful Execution Date : 2022-03-01 19:02:09 UTC (1646161329000)

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 178
      Metric Samples: Last Run: 5, Total: 890
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:10 UTC (1646161330000)
      Last Successful Execution Date : 2022-03-01 19:02:10 UTC (1646161330000)

    filebeat (unversioned)
    ----------------------
      Instance ID: filebeat:594c29c48fdd43de [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/filebeat.d/conf.yaml
      Total Runs: 178
      Metric Samples: Last Run: 4, Total: 708
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 178
      Average Execution Time : 6ms
      Last Execution Date : 2022-03-01 19:02:15 UTC (1646161335000)
      Last Successful Execution Date : 2022-03-01 19:02:15 UTC (1646161335000)

    http_check (6.1.2-rc.1)
    -----------------------
      Instance ID: http_check:Kafka Api liveness:db74e3a9e13b30f0 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/http_check.d/httpi.yaml
      Total Runs: 177
      Metric Samples: Last Run: 3, Total: 531
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 27ms
      Last Execution Date : 2022-03-01 19:02:07 UTC (1646161327000)
      Last Successful Execution Date : Never
      Error: u'notAfter'
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/base.py", line 1017, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/http_check/http_check.py", line 240, in check
          status, days_left, seconds_left, msg = self.check_cert_expiration(instance, timeout, instance_ca_certs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/http_check/http_check.py", line 326, in check_cert_expiration
          exp_date = datetime.strptime(cert['notAfter'], "%b %d %H:%M:%S %Y %Z")
      KeyError: u'notAfter'
      Instance ID: http_check:kubeapi_server_health_check:a9578cd2db2bee0c [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/http_check.d/kubeapi-server.yaml
      Total Runs: 178
      Metric Samples: Last Run: 5, Total: 890
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 2, Total: 356
      Average Execution Time : 48ms
      Last Execution Date : 2022-03-01 19:02:14 UTC (1646161334000)
      Last Successful Execution Date : 2022-03-01 19:02:14 UTC (1646161334000)

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 178
      Metric Samples: Last Run: 39, Total: 6,915
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:17 UTC (1646161337000)
      Last Successful Execution Date : 2022-03-01 19:02:17 UTC (1646161337000)

    jmxfetch (unversioned)
    ----------------------
      Instance ID: jmxfetch:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/jmxfetch.d/conf.yaml
      Total Runs: 177
      Metric Samples: Last Run: 2, Total: 354
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 177
      Average Execution Time : 265ms
      Last Execution Date : 2022-03-01 19:02:06 UTC (1646161326000)
      Last Successful Execution Date : 2022-03-01 19:02:06 UTC (1646161326000)

    kubelet (7.1.0)
    ---------------
      Instance ID: kubelet:5bbc63f3938c02f4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 134
      Metric Samples: Last Run: 832, Total: 112,824
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 536
      Average Execution Time : 295ms
      Last Execution Date : 2022-03-01 19:02:15 UTC (1646161335000)
      Last Successful Execution Date : 2022-03-01 19:02:15 UTC (1646161335000)

    kubernetes_apiserver
    --------------------
      Instance ID: kubernetes_apiserver [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
      Total Runs: 177
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:09 UTC (1646161329000)
      Last Successful Execution Date : 2022-03-01 19:02:09 UTC (1646161329000)

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 178
      Metric Samples: Last Run: 6, Total: 1,068
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:16 UTC (1646161336000)
      Last Successful Execution Date : 2022-03-01 19:02:16 UTC (1646161336000)

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 177
      Metric Samples: Last Run: 18, Total: 3,186
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:08 UTC (1646161328000)
      Last Successful Execution Date : 2022-03-01 19:02:08 UTC (1646161328000)

    network (2.4.0)
    ---------------
      Instance ID: network:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 178
      Metric Samples: Last Run: 73, Total: 12,994
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 3ms
      Last Execution Date : 2022-03-01 19:02:15 UTC (1646161335000)
      Last Successful Execution Date : 2022-03-01 19:02:15 UTC (1646161335000)

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 3
      Metric Samples: Last Run: 1, Total: 3
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 3
      Average Execution Time : 1ms
      Last Execution Date : 2022-03-01 18:47:55 UTC (1646160475000)
      Last Successful Execution Date : 2022-03-01 18:47:55 UTC (1646160475000)

    openmetrics (1.16.0)
    --------------------
      Instance ID: openmetrics:cc-cert-exporter:206dfb034ec9d8a1 [OK]
      Configuration Source: kubelet:docker://c3f28337d7a24bff036bc5a782790f1ff51d2f7227d6d82b12f1044171f60a44
      Total Runs: 177
      Metric Samples: Last Run: 3, Total: 531
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 177
      Average Execution Time : 26ms
      Last Execution Date : 2022-03-01 19:02:06 UTC (1646161326000)
      Last Successful Execution Date : 2022-03-01 19:02:06 UTC (1646161326000)

      Instance ID: openmetrics:cc-goldpinger:b8f5b76d5edd920d [OK]
      Configuration Source: kubelet:docker://bc51640f268d5fed014644b99c4c9acbe8e4cb54c335bc6e4b0b3acacc4a9b7b
      Total Runs: 177
      Metric Samples: Last Run: 7, Total: 1,239
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 177
      Average Execution Time : 63ms
      Last Execution Date : 2022-03-01 19:02:14 UTC (1646161334000)
      Last Successful Execution Date : 2022-03-01 19:02:14 UTC (1646161334000)

    process (2.1.1)
    ---------------
      Instance ID: process:kube-proxy:da726e50de09c3a5 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/process.d/conf.yaml
      Total Runs: 177
      Metric Samples: Last Run: 18, Total: 3,184
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 177
      Average Execution Time : 1ms
      Last Execution Date : 2022-03-01 19:02:05 UTC (1646161325000)
      Last Successful Execution Date : 2022-03-01 19:02:05 UTC (1646161325000)

      Instance ID: process:kubelet:f36c694c86c7b245 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/process.d/conf.yaml
      Total Runs: 178
      Metric Samples: Last Run: 18, Total: 3,202
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 178
      Average Execution Time : 1ms
      Last Execution Date : 2022-03-01 19:02:13 UTC (1646161333000)
      Last Successful Execution Date : 2022-03-01 19:02:13 UTC (1646161333000)

      Instance ID: process:systemd:f976e8edf3df59c7 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/process.d/conf.yaml
      Total Runs: 178
      Metric Samples: Last Run: 18, Total: 3,202
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 178
      Average Execution Time : 3ms
      Last Execution Date : 2022-03-01 19:02:12 UTC (1646161332000)
      Last Successful Execution Date : 2022-03-01 19:02:12 UTC (1646161332000)

    tls (2.6.0)
    -----------
      Instance ID: tls:public-cert-expiration-check:198d9e47b2956736 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tls.d/standard.yaml
      Total Runs: 177
      Metric Samples: Last Run: 2, Total: 354
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 708
      Average Execution Time : 7ms
      Last Execution Date : 2022-03-01 19:02:04 UTC (1646161324000)
      Last Successful Execution Date : 2022-03-01 19:02:04 UTC (1646161324000)

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 177
      Metric Samples: Last Run: 1, Total: 177
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-03-01 19:02:07 UTC (1646161327000)
      Last Successful Execution Date : 2022-03-01 19:02:07 UTC (1646161327000)

========
JMXFetch
========

  Information
  ==================
    runtime_version : 11.0.13
    version : 0.44.6
  Initialized checks
  ==================
    jmx
      instance_name : http-10.3.34.32-7203
      message : <no value>
      metric_count : 72
      service_check_count : 0
      status : OK
  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 373
    Successes By Endpoint:
      check_run_v1: 177
      intake: 15
      metadata_v1: 4
      series_v1: 177

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

  API Keys status
  ===============
    API key ending with <redacted>: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 62281

==========
Logs Agent
==========

  Logs Agent is not running

=========
APM Agent
=========
  Status: Running
  Pid: 414
  Uptime: 2671 seconds
  Mem alloc: 8,889,824 bytes
  Hostname: i-04c046d812156653x
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 219,423
  Dogstatsd Metric Sample: 34,379
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 177
  Series Flushed: 216,047
  Service Check: 8,131
  Service Checks Flushed: 8,268
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 34,378
  Metric Parse Errors: 0
  Service Check Packets: 178
  Service Check Parse Errors: 0
  Udp Bytes: 12,109,080
  Udp Packet Reading Errors: 0
  Udp Packets: 28,271
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0
  Unterminated Metric Errors: 0

=============
Autodiscovery
=============
  Enabled Features
  ================
    kubernetes

  Configuration Errors
  ====================
    kube-system/node-local-dns-kshnf
    --------------------------------
        annotation ad.datadoghq.com/kube2iam.check_names is invalid: kube2iam doesn't match a container identifier [node-cache]
        annotation ad.datadoghq.com/kube2iam.init_configs is invalid: kube2iam doesn't match a container identifier [node-cache]
        annotation ad.datadoghq.com/kube2iam.instances is invalid: kube2iam doesn't match a container identifier [node-cache]

Additional environment details (Operating System, Cloud provider, etc): Using Docker image datadog/agent:6.33.0-jmx on EKS v1.18.20

Steps to reproduce the issue:

  1. Run an https server in a k8s pod
  2. Configure dd-agent to do http_check similar to this.
    
    ad_identifiers:
    - http-server

init_config:

instances:

Describe the results you received:

=========
Collector
=========

  Running Checks
  ==============

    http_check (6.1.2-rc.1)
    -----------------------
      Instance ID: http_check:Kafka Api liveness:f8fb566fee6d98c8 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/http_check.d/http.yaml
      Total Runs: 1
      Metric Samples: Last Run: 3, Total: 3
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 37ms
      Last Execution Date : 2022-03-01 16:55:21 UTC (1646153721000)
      Last Successful Execution Date : Never
      Error: u'notAfter'
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/base.py", line 1017, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/http_check/http_check.py", line 240, in check
          status, days_left, seconds_left, msg = self.check_cert_expiration(instance, timeout, instance_ca_certs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/http_check/http_check.py", line 326, in check_cert_expiration
          exp_date = datetime.strptime(cert['notAfter'], "%b %d %H:%M:%S %Y %Z")
      KeyError: u'notAfter'

Describe the results you expected:

I expected dd-agent to be able to check the certificate expiration and populate the http.ssl.days_left metric

Additional information you deem important (e.g. issue happens only occasionally):

This PR that went into version 6.26/7.26 broke backward compatibility for this check in dd-agent. Previously, the socket mode was hard coded to ssl.CERT_REQUIRED. That PR switched to using the TlsContextWrapper which only sets that mode if full verification is enabled. In ssl.CERT_NONE mode, the SSL Socket will not return a peer certificate leading to this error

sarah-witt commented 2 years ago

Hi @theduderog, thank you for the detailed write-up, we will look into this change in behavior. Do you mind creating a support ticket for this issue as well so we can get more information and investigate further?

theduderog commented 2 years ago

@sarah-witt Done. Request #703748)

theduderog commented 2 years ago

@hithwen why did you close the issue with no explanation?

hithwen commented 2 years ago

Hi @theduderog I'm closing this issue because it is now tracked on the support ticket.

theduderog commented 2 years ago

@hithwen @sarah-witt There's no meaningful action on my support case. Should we reopen this? Unless I'm missing something, dd-agent should not break backward compatibility, especially in a minor version release and needs to be fixed ASAP for all DD customers.