DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.91k stars 1.21k forks source link

OTLP ingest fails for traces on version 7.35.0 #11737

Open xdu-opendoor opened 2 years ago

xdu-opendoor commented 2 years ago

Output of the info page (if this is a bug)

output from agent status:

===============
Agent (v7.35.0)
===============

  Status date: 2022-04-21 09:24:50.9 UTC (1650533090900)
  Agent start: 2022-04-21 09:24:16.956 UTC (1650533056956)
  Pid: 377
  Go Version: go1.17.6
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: 84.036ms
    System time: 2022-04-21 09:24:50.9 UTC (1650533090900)

  Host Info
  =========
    bootTime: 2022-04-20 16:58:48 UTC (1650473928000)
    kernelArch: x86_64
    kernelVersion: 5.10.47-linuxkit
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.10
    procs: 13
    uptime: 16h25m34s
    virtualizationRole: guest
    virtualizationSystem: docker

  Hostnames
  =========
    hostname: ff42f84c2181
    socket-fqdn: ff42f84c2181
    socket-hostname: ff42f84c2181
    hostname provider: os
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      container: Unable to get hostname from container API
      gce: unable to retrieve hostname from GCE: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

  Metadata
  ========
    agent_version: 7.35.0
    config_apm_dd_url: 
    config_dd_url: 
    config_logs_dd_url: 
    config_logs_socks5_proxy_address: 
    config_no_proxy: []
    config_process_dd_url: 
    config_proxy_http: 
    config_proxy_https: 
    config_site: 
    feature_apm_enabled: true
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_logs_enabled: false
    feature_networks_enabled: false
    feature_networks_http_enabled: false
    feature_networks_https_enabled: false
    feature_otlp_enabled: true
    feature_process_enabled: false
    feature_processes_container_enabled: true
    flavor: agent
    hostname_source: os
    install_method_installer_version: docker
    install_method_tool: docker
    install_method_tool_version: docker

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 9, Total: 11
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-21 09:24:36 UTC (1650533076000)
      Last Successful Execution Date : 2022-04-21 09:24:36 UTC (1650533076000)

    disk (4.6.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 288, Total: 576
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 36ms
      Last Execution Date : 2022-04-21 09:24:43 UTC (1650533083000)
      Last Successful Execution Date : 2022-04-21 09:24:43 UTC (1650533083000)

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 5, Total: 10
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-21 09:24:50 UTC (1650533090000)
      Last Successful Execution Date : 2022-04-21 09:24:50 UTC (1650533090000)

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 28, Total: 38
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-21 09:24:42 UTC (1650533082000)
      Last Successful Execution Date : 2022-04-21 09:24:42 UTC (1650533082000)

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 6, Total: 12
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-21 09:24:49 UTC (1650533089000)
      Last Successful Execution Date : 2022-04-21 09:24:49 UTC (1650533089000)

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 20, Total: 40
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-21 09:24:41 UTC (1650533081000)
      Last Successful Execution Date : 2022-04-21 09:24:41 UTC (1650533081000)

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 1
      Metric Samples: Last Run: 1, Total: 1
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 1
      Average Execution Time : 518ms
      Last Execution Date : 2022-04-21 09:24:22 UTC (1650533062000)
      Last Successful Execution Date : 2022-04-21 09:24:22 UTC (1650533062000)

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 1, Total: 2
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-04-21 09:24:48 UTC (1650533088000)
      Last Successful Execution Date : 2022-04-21 09:24:48 UTC (1650533088000)

========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Ingress: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 6
    Successes By Endpoint:
      check_run_v1: 2
      intake: 2
      series_v1: 2

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

  API Keys status
  ===============
    API key ending with 68249: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 68249

==========
Logs Agent
==========

  Logs Agent is not running

=============
Process Agent
=============

  Version: 7.35.0
  Status date: 2022-04-21 09:24:50.903 UTC (1650533090903)
  Process Agent Start: 2022-04-21 09:24:17.055 UTC (1650533057055)
  Pid: 375
  Go Version: go1.17.6
  Build arch: amd64
  Log Level: info
  Enabled Checks: [process_discovery]
  Allocated Memory: 14,241,808 bytes
  Hostname: ff42f84c2181

  =================
  Process Endpoints
  =================
    https://process.datadoghq.com - API Key ending with:
        - 68249

  =========
  Collector
  =========
    Last collection time: 2022-04-21 09:24:19
    Docker socket: 
    Number of processes: 0
    Number of containers: 0
    Process Queue length: 0
    RTProcess Queue length: 0
    Pod Queue length: 0
    Process Bytes enqueued: 0
    RTProcess Bytes enqueued: 0
    Pod Bytes enqueued: 0

=========
APM Agent
=========
  Status: Running
  Pid: 376
  Uptime: 33 seconds
  Mem alloc: 9,270,112 bytes
  Hostname: ff42f84c2181
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 720
  Dogstatsd Metric Sample: 198
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 2
  Series Flushed: 447
  Service Check: 16
  Service Checks Flushed: 16
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 197
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 15,034
  Udp Packet Reading Errors: 0
  Udp Packets: 138
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0
  Unterminated Metric Errors: 0

output from trace-agent -info:

======================
Trace Agent (v 7.35.0)
======================

  Pid: 376
  Uptime: 71 seconds
  Mem alloc: 9361584 bytes

  Hostname: ff42f84c2181
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  --- Receiver stats (1 min) ---

  --- Writer stats (1 min) ---

  Traces: 0 payloads, 0 traces, 0 bytes
  Stats: 0 payloads, 0 stats buckets, 0 bytes

Describe what happened:

knowing opentelemetry-exporter-datadog is being deprecated, I'm configured my application to use opentelemetry-exporter-otlp and hoping datadog-agent otlp ingestion feature can make this work.

after I made the change, I didn't see the traces going to the APM backend.

Describe what you expected:

using opentelemetry-exporter-otlp and datadog-agent@v7.35.0, traces from my application show up in datadog backend correctly as before (integrated with opentelemetry-exporter-datadog)

Steps to reproduce the issue:

I've build a project with a simplified app and my configuration: https://github.com/duxing/datadog-otlp all the detailed steps in README.

Additional environment details (Operating System, Cloud provider, etc):

mx-psi commented 2 years ago

👋 Thanks for the detailed repro and instructions, it made it really easy to work with it! I think duxing/datadog-otlp/pull/1 will solve your issue (or at the very least it will make OTLP traces reach the Datadog Agent)

duxing commented 2 years ago

@mx-psi I don't think this PR will fix the issue. The issue is inside the datadog-agent and this change opens a port between datadog agent and host machine. and I've verified the change doesn't fix the problem: datadog backend still has no data and tcpdump on port 5003 (inside datadog-agent container )still see no traffic.

duxing commented 2 years ago

note this issue persists when I used the latest version 7.35.1

mx-psi commented 2 years ago

I replied on the PR with detailed steps of what I did, could you take a look?

duxing commented 2 years ago

I replied on the PR with detailed steps of what I did, could you take a look?

checked. sry about duplicating our conversation. let's try to resolve it there and post the result here.