DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
630 stars 297 forks source link

high memory usage starting with dd-trace 4.3.0 #3387

Open talamcol opened 1 year ago

talamcol commented 1 year ago

We recently updated the dd-trace library from 4.2.0 to 4.3.0. I tried also with the latest version 4.6.0 and the issue still exists there. Our memory consumption is usually pretty stable between 220 and 300M. Once updated to version 4.3.0, it starts to spike heavily.

Actual behaviour Bildschirm­foto 2023-07-13 um 08 48 23

Steps to reproduce

Environment

    "dd-trace": "4.6.0",
    "next": "13.4.9",
    "react": "18.2.0",
    "react-dom": "18.2.0",
    "react-hook-form": "7.45.1",
    "swr": "2.2.0",
    "typescript": "5.1.6",
kejistan commented 1 year ago

Also encountered this. Something appears to be leaking memory. Our overnight heap use is typically quite stable, as there isn't much going on, but with 4.3.0 it just kept increasing linearly. Downgrading to 4.2.0 has fixed things for the moment.

webark commented 1 year ago

Just lost my saturday to this.. :/ was on node 18 using a lot of native fetch and 4.6.0. i downgraded to 3.18

Averethel commented 1 year ago

The issue does not seem to be fixed. Upgrading from 4.2 to 4.8.1 still adds over 100MiB per pod which steadily increases.

rochdev commented 1 year ago

@Averethel Is the additional memory usage resolved if you disable the fetch plugin with tracer.use('fetch', false)?

Averethel commented 1 year ago

@rochdev yes, the memory usage still goes up with the fetch plugin disabled. It's a slower but still a noticeable increase in memory usage

rochdev commented 1 year ago

@Averethel It's possible since 4.9.0 to disable the fetch instrumentation (so not just the plugin). Can you try that so we can at least isolate the issue to the integration? This can be done with DD_TRACE_DISABLED_INSTRUMENTATIONS=fetch.

webark commented 1 year ago

@rochdev I upgraded to 4.12 from 3.18, and started getting the leak. Then I set the DD_TRACE_DISABLED_INSTRUMENTATIONS=fetch flag, and it seems to have removed it.

We are on node 18 and using native fetch in all the code we own. (some sdks and things have node-fetch as a dependency, but we don't enable it for any of our code).

Screenshot 2023-08-17 at 6 35 12 AM
webark commented 12 months ago

@rochdev any idea for a path forward here? Should I open up a new issue?

rochdev commented 12 months ago

@webark Is this something you can reproduce with a snippet you could share? I can't reproduce on my end even with the integration enabled, and the instrumentation also doesn't do much, so I can't really see what could be causing the issue.

webark commented 12 months ago

@rochdev Whats with this part?

https://github.com/DataDog/dd-trace-js/blob/820101709abe2b153f165a723daed30389aeb56b/packages/datadog-instrumentations/src/fetch.js#L21-L25

It seems like it's rewriting the arguments? I wonder if maybe this is causing some confusion in the garbage collector? Is there a reason it has to be done this way? Just a thought of what stood out after a first read through.

Also, does fetch need to keep its this scope there? Could you just run this using null instead?

rochdev commented 11 months ago

I wonder if maybe this is causing some confusion in the garbage collector?

Possible, although I doubt it would be to the extent of causing a memory leak since there are no additional references to long-lived objects.

Also, does fetch need to keep its this scope there? Could you just run this using null instead?

If it's already null then it will still be null, so it's effectively the same already. And if there is a value for this, then it should be kept.

AceFire6 commented 11 months ago

To add to this, we upgraded to the latest version of dd-trace (4.15.0) and immediately saw a massive increase in memory usage. Downgrading to v4.2.0 seems to have fixed things.

image

MCPx commented 9 months ago

@rochdev we attempted to upgrade to 4.18.0 from 4.2.0 again to see if this issue persists and although our non-Next.js services are fine, our one Next.js service very quickly started leaking memory, especially in the ReadableStream method as shown in the profiler screenshot below. Not sure if I should create a new issue for this?

Running Next.js version ^12.3.4 in this service.

image
svrakitin commented 9 months ago

Same here, profiling points to the leaked context (I guess headers) taking a lot of heap memory in https://github.com/DataDog/dd-trace-js/blob/c35772587a1f5e0177dd694f1e90d1c5828700a2/packages/datadog-instrumentations/src/http/client.js#L48

Screenshot 2023-11-28 at 00 15 52

Node 20.9.0, dd-trace 4.20.0

sabrenner commented 8 months ago

@svrakitin Is this leak coming from a service that's also using Next.js? And if so, what version?

svrakitin commented 8 months ago

@sabrenner Not using Next.js

webark commented 8 months ago

We are not using nextjs.

I recently upgraded to node 20 and version 4.20, and I think it fixed it? I'm gong to be doing some more tests to ensure.

psimk commented 7 months ago

👋 we encountered this same issue on Next 13, dd-trace 5 and Node 20. This is a pretty serious blocker for us to continue using dd-trace.

psimk commented 6 months ago

I have found that disabling the http integration (using DD_TRACE_DISABLED_INSTRUMENTATIONS=http) solved our issues. However, we of course cannot do that as we are depending on a lot of the functionality provided by this instrumentation.

zkorhone commented 5 months ago

I noticed that recent datadog lambda layers have upgraded to 4.x of dd-trace and got worried, if this bug would impact our lambdas.

chenglx commented 3 months ago

Is this issue fixed by dd-trace v5.x?

mariuszbeltowski commented 2 months ago

Is there any update on that? Having the same issue with latest 5.15.0 version using Apollo server.

webark commented 2 months ago

We upgraded to node 20 and it started working. If you are using a fetch shim (cross fetch, node fetch, etc) you might want to see if you can push those versions all the way to the latest, or if they are direct dependencies, just pull them and use all native fetch.

mariuszbeltowski commented 2 months ago

The issues still persist using node v20 and Apollo server

tills13 commented 1 month ago

Seems to be specific to the http plugin. Disabling this sucks as it effectively knee-caps distributed tracing.

Green is when we updated NextJS and DD to 4.27.

image

You can see a clear change where, after the update, memory usage slowly increases over time. We're still not sure what changed recently that made it go wild.

Zooming in, between green here I tested disabling datadog tracing altogether.

image

Memory usage stabilized around where I would expect to be. After, over this last weekend, I ran two simultaneous tests based on the comments above: 1. re-enabled tracing and disabled the next plugin and 2. downgraded to v3.18. The behaviour returned.

I am currently testing disabling http and it seems like it's helping though too soon to tell.

I took some heapsnapshots and noticed a lot of persistent ServerResponses and IncomingMessages held onto by dd resources:

image image

No obvious reason so far.

Update: disabling only the http trace didn't help, though it seems to have slowed down the leak.

image