DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
650 stars 306 forks source link

DNS lookup on an ip address for every UDP package send out. #2984

Open Raynos opened 1 year ago

Raynos commented 1 year ago

Our nodejs app that uses datadog does mostly dns lookup and I believe its from the statsd client that does a dns lookup for every UDP packet.

Please do not do a DNS lookup on an ip address for every UDP package send to statsd.

You can see https://github.com/uber-archive/node-statsd-client/tree/master for an implementation of a statsd UDP client that uses dns lookup caching.

tlhunter commented 1 year ago

@Raynos how are you observing the DNS lookups? Is the app using something like hot-shots?

Raynos commented 1 year ago

@tlhunter Is there a vehicle to share screenshots / datadog links to employees outside of the public github. Can send you the dns.lookup prod info. It's all coming out of APM and showing dns.lookup

The IP address of the EC2 I was running the datadog agent on had 100x more DNS lookup calls then other useful services like doing dns lookup on actual DNS domains in the http clients.

I patched it with a few changes

const dogStatsClient = new StatsD({
  host: DOGSTATSD_HOST,
  udpSocketOptions: {
    type: 'udp4',
    lookup: (hostname, options, callback) => {
      // if IP address, bypass dns.lookup
      if (net.isIP(hostname)) {
        callback(null, hostname, 4);
        return;
      }

      // if a real hostname, then resolve using dns.lookup
      dns.lookup(hostname, options, callback);
    },
  },
}

This is me configuring hot-shots to skip dns lookup on IP address and definitely a lot of dns lookups came out of hot-shots

But there's also code in the datadog agent ( I am on v2 instead of v3 I learned recently ... ) But when I looked at the code on master I dont think its "fixed/improved" on master.

I also patched dd-trace itself like so

diff --git a/packages/dd-trace/src/dogstatsd.js b/packages/dd-trace/src/dogstatsd.js
index c46d7cac961fc2c4e0397a2038f6026b1a55db38..1d65d2c6648d93a112bf1b2cbd1ae74b589d0403 100644
--- a/packages/dd-trace/src/dogstatsd.js
+++ b/packages/dd-trace/src/dogstatsd.js
@@ -12,6 +12,11 @@ class Client {
     options = options || {}

     this._host = options.host || 'localhost'
+
+    if (!isIP(this._host)) {
+      throw new Error('dd-trace dogstatsd host must be raw IP');
+    }
+
     this._family = isIP(this._host)
     this._port = options.port || 8125
     this._prefix = options.prefix || ''
@@ -38,14 +43,7 @@ class Client {

     this._queue = []

-    if (this._family !== 0) {
-      this._sendAll(queue, this._host, this._family)
-    } else {
-      lookup(this._host, (err, address, family) => {
-        if (err) return log.error(err)
-        this._sendAll(queue, address, family)
-      })
-    }
+    this._sendAll(queue, this._host, 4)
   }

   _send (address, family, buffer) {
@@ -94,7 +92,19 @@ class Client {
   }

   _socket (type) {
-    const socket = dgram.createSocket(type)
+    const socket = type === 'udp4' ? dgram.createSocket({
+      type: 'udp4',
+      lookup: (hostname, options, callback) => {
+        // if IP address, bypass dns.lookup
+        if (isIP(hostname)) {
+          callback(null, hostname, 4);
+          return;
+        }
+
+        // if a real hostname, then resolve using dns.lookup
+        lookup(hostname, options, callback);
+      },
+    }) : dgram.createSocket(type);

     socket.on('error', () => {})
     socket.unref()

I still see DNS lookups to the EC2 ip that datadog agent is running on but it's 99% less then it was before i "improved" hot-shots and the embedded dd-trace/src/dogstatds.js

Raynos commented 1 year ago

I was able to find the remaining callsite and its in the dns cache implementation of hot-shots I can fix that too.

Raynos commented 1 year ago

Yeah removed redundant dns cache in hot-shots I gave it an IP, dont need to DNS cache an IP, its not a domain name.

Raynos commented 1 year ago

This is the line I need to modify in dd-trace ( https://github.com/DataDog/dd-trace-js/blob/a0c2514b3b8e08ae2ba666a99464c82883166d79/packages/dd-trace/src/dogstatsd.js#L129 )

Want to pass the lookup function into createSocket({ lookup })

elyndefigma commented 3 months ago

@Raynos - thank you for this issue. What did you all end up doing here?

Raynos commented 2 months ago

I used pnpm and the pnpm patches system to effectively fix it in place without forking