DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
645 stars 305 forks source link

Plugins consume too much CPU #969

Closed kaievns closed 3 years ago

kaievns commented 4 years ago

Describe the bug Ohi there, I've been struggling to instrument an express/mongo-core app with dd-trace for a few days now. I looked up the issues history and couldnt find anything similar, and so I thought I'd report this, maybe I'm doing this wrong, or maybe there is a problem I'm not aware of.

Basically, I have a dozen microservices, fairly simple RESTful APIs, pretty much straight up express+mongo-core applications. All of this runs in kubernetes on GCP. The kubes side is all hooked up and seems to work fine, but, on the node application side I'm seeing a weird behavior where CPU usage spikes up 10x when I enable the plugins.

Here is how the code looks like:

  const tracer = require('dd-trace');
  tracer.init({
    hostname: process.env.DD_AGENT_HOST,
    service: process.env.SERVICE_NAME,
    env: process.env.ENVIRONMENT,
    runtimeMetrics: true,
    analytics: true
  });

This creates a spike in CPU utilisation, see other services with plugins disabled, versus the CORE service that has plugins enabled

image

If i disable all plugins by adding plugins: false it falls back to normality, but as soon as I try to activate any plugins, say like so:

  const tracer = require('dd-trace');
  tracer.init({
    hostname: process.env.DD_AGENT_HOST,
    service: process.env.SERVICE_NAME,
    env: process.env.ENVIRONMENT,
    runtimeMetrics: true,
    analytics: true,
    plugins: false
  });
  tracer.use('http');

It jumps back up to 10x CPU usage.

Any ideas what's causing this? Anything I can do about it?

Thanks

Environment microservices in kubernetes

rochdev commented 3 years ago

do i understand this correctly that i need to specify the protocolVersion: '0.5' explicitly to make this work?

This is not necessarily required and depends a lot on the traffic received by the app. Since this protocol version uses a string table, the more traces generated per second the higher the benefit. If it doesn't help for your app specifically, you can just leave the default which will eventually be changed to that version in a future release of the tracer anyway.

btw, most of the performance issues were gone for us after an upgrade to node 14.5 and disabling the asyncScope thing

That's one of the improvements in this release that now uses AsyncLocalStorage by default which doesn't need trackAsyncScope in Node >=14.5 since the issue with thenables has been fixed directly in Node and no longer needs our previous hack which was quite heavy but necessary for context propagation to work before the fix.

We've made quite a few improvements in this release which can be found in the release notes.

nburkley commented 3 years ago

@rochdev I just wanted to leave some feedback and thank you for your work on this.

Just a few weeks ago, we added dd-trace to a relatively large hapi.js app to get some distributed tracing across our services. The app is running on an old version of node 10.x and gets about 20 req/sec. We noticed a large increase in CPU usage since adding the library but were able to scale up resources to handle this.

We tested out the newer version 0.27 this week and saw significant performance improvements straight away. The CPU usage has returned to levels close to those before adding dd-trace. We're running this on production now and plan to scale down CPU resources once again.

Thanks for the great work on this.

rochdev commented 3 years ago

Closing this since the original issue which was caused by a regression in 0.19 has been fixed completely since 0.27.

kaievns commented 3 years ago

Apologies, i forgot to post an update. We had bumped our services to 0.27 with removed async scope override and i confirm everything seems to work fine now, thanks

On 4 Nov 2020, at 3:46 am, Roch Devost notifications@github.com wrote:

 Closing this since the original issue which was caused by a regression in 0.19 has been fixed completely since 0.27.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.