Open Raynos opened 1 year ago
@Raynos how are you observing the DNS lookups? Is the app using something like hot-shots
?
@tlhunter Is there a vehicle to share screenshots / datadog links to employees outside of the public github. Can send you the dns.lookup prod info. It's all coming out of APM and showing dns.lookup
The IP address of the EC2 I was running the datadog agent on had 100x more DNS lookup calls then other useful services like doing dns lookup on actual DNS domains in the http clients.
I patched it with a few changes
const dogStatsClient = new StatsD({
host: DOGSTATSD_HOST,
udpSocketOptions: {
type: 'udp4',
lookup: (hostname, options, callback) => {
// if IP address, bypass dns.lookup
if (net.isIP(hostname)) {
callback(null, hostname, 4);
return;
}
// if a real hostname, then resolve using dns.lookup
dns.lookup(hostname, options, callback);
},
},
}
This is me configuring hot-shots
to skip dns lookup on IP address and definitely a lot of dns lookups came out of hot-shots
But there's also code in the datadog agent ( I am on v2 instead of v3 I learned recently ... ) But when I looked at the code on master
I dont think its "fixed/improved" on master.
I also patched dd-trace
itself like so
diff --git a/packages/dd-trace/src/dogstatsd.js b/packages/dd-trace/src/dogstatsd.js
index c46d7cac961fc2c4e0397a2038f6026b1a55db38..1d65d2c6648d93a112bf1b2cbd1ae74b589d0403 100644
--- a/packages/dd-trace/src/dogstatsd.js
+++ b/packages/dd-trace/src/dogstatsd.js
@@ -12,6 +12,11 @@ class Client {
options = options || {}
this._host = options.host || 'localhost'
+
+ if (!isIP(this._host)) {
+ throw new Error('dd-trace dogstatsd host must be raw IP');
+ }
+
this._family = isIP(this._host)
this._port = options.port || 8125
this._prefix = options.prefix || ''
@@ -38,14 +43,7 @@ class Client {
this._queue = []
- if (this._family !== 0) {
- this._sendAll(queue, this._host, this._family)
- } else {
- lookup(this._host, (err, address, family) => {
- if (err) return log.error(err)
- this._sendAll(queue, address, family)
- })
- }
+ this._sendAll(queue, this._host, 4)
}
_send (address, family, buffer) {
@@ -94,7 +92,19 @@ class Client {
}
_socket (type) {
- const socket = dgram.createSocket(type)
+ const socket = type === 'udp4' ? dgram.createSocket({
+ type: 'udp4',
+ lookup: (hostname, options, callback) => {
+ // if IP address, bypass dns.lookup
+ if (isIP(hostname)) {
+ callback(null, hostname, 4);
+ return;
+ }
+
+ // if a real hostname, then resolve using dns.lookup
+ lookup(hostname, options, callback);
+ },
+ }) : dgram.createSocket(type);
socket.on('error', () => {})
socket.unref()
I still see DNS lookups to the EC2 ip that datadog agent is running on but it's 99% less then it was before i "improved" hot-shots and the embedded dd-trace/src/dogstatds.js
I was able to find the remaining callsite and its in the dns cache implementation of hot-shots
I can fix that too.
Yeah removed redundant dns cache in hot-shots
I gave it an IP, dont need to DNS cache an IP, its not a domain name.
This is the line I need to modify in dd-trace
( https://github.com/DataDog/dd-trace-js/blob/a0c2514b3b8e08ae2ba666a99464c82883166d79/packages/dd-trace/src/dogstatsd.js#L129 )
Want to pass the lookup
function into createSocket({ lookup })
@Raynos - thank you for this issue. What did you all end up doing here?
I used pnpm and the pnpm patches system to effectively fix it in place without forking
Our nodejs app that uses datadog does mostly dns lookup and I believe its from the statsd client that does a dns lookup for every UDP packet.
Please do not do a DNS lookup on an ip address for every UDP package send to statsd.
You can see https://github.com/uber-archive/node-statsd-client/tree/master for an implementation of a statsd UDP client that uses dns lookup caching.