DataDog / helm-charts

Helm charts for Datadog products
Apache License 2.0
346 stars 1.01k forks source link

APM Traces are not collecting from Java Clients #793

Open parasappafluke opened 2 years ago

parasappafluke commented 2 years ago

Hi, We are using DD helm chart version 3.1.3 with latest 7.39.0-jmx agent docker image and latest tracer-agent. We started facing APM traces are not sending to DD. Below are the error from java clients. Java Clients Output ` [dd.trace 2022-10-24 11:48:07:924 +0000] [StatsD-Sender-1] WARN datadog.communication.monitor.DDAgentStatsDConnection - IOException in StatsD client - /var/run/datadog/dsd.socket java.io.IOException: Resource temporarily unavailable (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 11:48:56:374 +0000] [OkHttp http://172.22.0.81:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile, received empty reply from http://172.22.0.81:8126/profiling/v1/input after uploading profile (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 11:50:10:419 +0000] [dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 2 (size=3KB) traces to the DD agent. Total: 56523, Received: 56523, Sent: 0, Failed: 56523. java.net.SocketTimeoutException: connect timed out (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 11:53:56:375 +0000] [OkHttp http://172.22.0.81:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile, received empty reply from http://172.22.0.81:8126/profiling/v1/input after uploading profile (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 11:54:07:924 +0000] [StatsD-Sender-1] WARN datadog.communication.monitor.DDAgentStatsDConnection - IOException in StatsD client - /var/run/datadog/dsd.socket java.io.IOException: Resource temporarily unavailable (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 11:55:10:730 +0000] [dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 2 (size=3KB) traces to the DD agent. Total: 56584, Received: 56584, Sent: 0, Failed: 56584. java.net.SocketTimeoutException: connect timed out (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 11:58:56:378 +0000] [OkHttp http://172.22.0.81:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile, received empty reply from http://172.22.0.81:8126/profiling/v1/input after uploading profile (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 12:00:11:039 +0000] [dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 2 (size=3KB) traces to the DD agent. Total: 56645, Received: 56645, Sent: 0, Failed: 56645. java.net.SocketTimeoutException: connect timed out (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 12:15:56:375 +0000] [OkHttp http://172.22.0.81:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile, received empty reply from http://172.22.0.81:8126/profiling/v1/input after uploading profile (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 12:20:12:285 +0000] [dd-trace-processor] WARN datadog.trace.agent.common.writer.ddagent.DDAgentApi - Error while sending 2 (size=3KB) traces to the DD agent. Total: 56889, Received: 56889, Sent: 0, Failed: 56889. java.net.SocketTimeoutException: connect timed out (Will not log errors for 5 minutes)

[dd.trace 2022-10-24 12:21:56:374 +0000] [OkHttp http://172.22.0.81:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile, received empty reply from http://172.22.0.81:8126/profiling/v1/input after uploading profile (Will not log errors for 5 minutes) `

agent status command output `========= APM Agent

Status: Running Pid: 1 Uptime: 281174 seconds Mem alloc: 9,132,576 bytes Hostname: i-0ed8ac1069c3cd0d4 Receiver: 0.0.0.0:8126 Endpoints: https://trace.agent.datadoghq.com

Receiver (previous minute)

No traces received in the previous minute.`

Are we missing any configuration to config for helm chart?

Thanks Parasappa

clamoriniere commented 2 years ago

Hi @parasappafluke,

could you tell us if it is a regression? or if it is a first attempt to user APM in Kubernetes with the helm chart?

Quick investigation from the data that you have provided

it seems that the APM Java trace library is configured to use the UDS socket to communicate with the datadog-agent.

If my previous comments are not enough to solve the issue, could you please contact Datadog support and provide Agent flare

parasappafluke commented 2 years ago

@clamoriniere thank you for the quick response. We have configured below volume and volume mounts for APM based on the document.

` volumes:

shubhamsavii commented 1 year ago

Hi @parasappafluke , were you able to find any solution? we are facing a similar issue with java application.

NaitYoussef commented 1 year ago

Hi @parasappafluke we have the same issue

oshriza commented 1 year ago

We have the same issue, Is any solution?

ahululu commented 11 months ago

We have the same issue, Does anyone know how to solve it(helm chart deploy)?

"[dd.trace 2023-11-28 03:01:03:014 +0000] [StatsD-Sender-1] WARN datadog.communication.monitor.DDAgentStatsDConnection - IOException in StatsD client - /var/run/datadog/dsd.socket java.io.IOException: Resource temporarily unavailable (Will not log errors for 5 minutes)"

I've tried the following methods but still haven't solved the problem: `

  1. dogstatsd.soRcvbuf: "4194304"
  2. agents.podSecurity.allowedUnsafeSysctls:
    • name: net.core.rmem_max value: "26214400"
    • name: net.unix.max_dgram_qlen value: "512"
    • name: net.core.wmem_max value: "4194304" `

Sometimes, I don’t know how to configure the configurations mentioned in the datadog documentation on helm charts, such as the dogstatsd_so_rcvbuf and sysctl related configurations mentioned in https://docs.datadoghq.com/developers/dogstatsd/high_throughput/. Is there any way Can you check if your helm charts configuration is correct?

br-fedaykin commented 3 weeks ago

I got this issue and my problem was solved when I pointed the right url in datadog.site field in datadog helm chart.