DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
619 stars 296 forks source link

dogstatsd sending metrics over http when udp supported #4333

Open haileeyyy opened 2 months ago

haileeyyy commented 2 months ago

https://github.com/DataDog/dd-trace-js/blob/a346dd54b3412253ea3a570edb5b05dfbd4e0acb/packages/dd-trace/src/dogstatsd.js#L174-L182

We're seeing unexpected traces for dns requests to the datadog agent ip, which we believe is due to runtime metrics and dogstatsd.

our client is using the default configuration, and we're relying on the DD_AGENT_HOST env var which is set using the kubernetes downwards api.

- name: DD_AGENT_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

with the code I've referenced above, I believe the else if (config.port) { line to be incorrect, because in our case, we have both a config.port value AND a config.dogstatsd.hostname and config.dogstatsd.port, but I don't think we want a metricsProxyUrl set?

I think the correct conditional would be: else if (config.port && !config.dogstatsd) {, but I'm not sure if there is a particular reason that this exists.

either way, because we're using the default config from dd-trace (i.e. the config is computed), I don't think it's actually possible for us to use udp instead of http for dogstatsd.

hamiltop commented 1 month ago

We've seen memory leaks since upgrading and think it may be related.

Running express/apollo-server, we occasionally see a process get into a state where it gets repeated socket timeouts and memory slowly climbs until it exceeds allocated memory and is OOM killed.

We haven't been able to provide a controlled reproduction, so it's not completely clear, but our hypothesis is that by switching to the http instead of udp we are exhausting some resource (e.g. dns) which then causes buffered data to pile up in memory. Many assumptions there, but getting this fixed would either solve or eliminate many of those assumptions.