getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.87k stars 1.55k forks source link

Sentry upgrade from 7.118.0 to 8.26.0 leak memory #13412

Open flav-code opened 4 weeks ago

flav-code commented 4 weeks ago

Is there an existing issue for this?

How do you use Sentry?

Self-hosted/on-premise

Which SDK are you using?

@sentry/node

SDK Version

8.26.0

Framework Version

No response

Link to Sentry event

No response

Reproduction Example/SDK Setup

my sentry init in 7.118.0

    init({
        dsn: process.env.SENTRY_DSN,
        environment: process.env.NODE_ENV || "none",
        release: require("../../package.json").version,
        serverName: `${cluster.bot}-c=${cluster.id}-first=${cluster.first}-last=${cluster.last}` || "none",
        integrations: [
            new Integrations.Postgres({ module: require("pg") }),
            new Integrations.Modules(),
            new Integrations.FunctionToString(),
            new Integrations.LinkedErrors(),
            new Integrations.Console(),
            new Integrations.Http({ breadcrumbs: true, tracing: true }),
            rewriteFramesIntegration({ root: path.join(__dirname, "..") }),
        ],
        // Performance Monitoring
        tracesSampleRate: 1.0, //  Capture 100% of the transactions
    });

my sentry init in 8.6.0

    Sentry.init({
        dsn: process.env.SENTRY_DSN,
        environment: process.env.NODE_ENV || "none",
        release: require("../../package.json").version,
        serverName: `${cluster.bot}-c=${cluster.id}-first=${cluster.first}-last=${cluster.last}` || "none",
        integrations: [
            Sentry.modulesIntegration(),
            Sentry.functionToStringIntegration(),
            Sentry.linkedErrorsIntegration(),
            Sentry.consoleIntegration(),
            Sentry.httpIntegration({ breadcrumbs: true }),
            Sentry.rewriteFramesIntegration({ root: path.join(__dirname, "..") }),
            Sentry.onUnhandledRejectionIntegration(),
            Sentry.onUncaughtExceptionIntegration(),
            Sentry.redisIntegration(),
            Sentry.postgresIntegration(),
        ],
        // To avoid sending too much data to Sentry, we can reduce the sample rate of traces and profiles
        tracesSampleRate: 1.0,
        profilesSampleRate: 1.0,
    });

Steps to Reproduce

I'll try removing some of the integrations to see what's causing the problem.

Expected Result

A normal memory usage

Image

Actual Result

anormal memory usage

Image

Lms24 commented 3 weeks ago

Hey @flav-code thanks for writing in!

We'll look into your issue next week as this week is Hackweek at Sentry (see #13421).

lforst commented 2 weeks ago

Hi, @flav-code would you be able to provide a memory snapshot with the Node/v8 profiler so that we can look at what is holding the references causing the leak? Feel free to also shoot us an email or twitter dm if you don't want to publicly share it. Thanks!

flav-code commented 2 weeks ago

I can't give you an snapshot because I have a lot of private information.

flav-code commented 2 weeks ago

I made a snapshot, I'll examine it and show you

flav-code commented 2 weeks ago

Image

flav-code commented 2 weeks ago

Set, Span and NonRecordingSpan are from Sentry

lforst commented 2 weeks ago

Yeah that looks like Sentry. Would you mind digging around a bit ant examine what holds the references to the spans?

flav-code commented 2 weeks ago

Image

flav-code commented 2 weeks ago

Image

amakhrov commented 2 weeks ago

Set, Span and NonRecordingSpan are from Sentry

This indicates that Sentry objects are retained in memory. It doesn't mean they are causing this retention though! E.g. the expanded Span object has the distance (from root) of 15. The sentryRootSpan - distance of 12. It means something else is holding the reference to it, something that is closer to the root. You might want to explore the bottom part of the memory profiler - the Retainers section

flav-code commented 2 weeks ago

Image

amakhrov commented 2 weeks ago

timerListMap[180000] - my understanding is that the app has started a timer for 180s (3min). Any chance it's your app code rather than Sentry?

flav-code commented 2 weeks ago

I don't think so. Before switching to sentry v8, someone warned me about the memory leak.

lforst commented 2 weeks ago

I have a few questions to narrow this down further. I am not ruling out that our SDK is causing the leak:

flav-code commented 2 weeks ago

Hello, I use sentry on a discord bot not a website Node version: v20.17.0

Image

Image

lforst commented 2 weeks ago

@flav-code Would you mind answering the questions I asked. It's important we have answers to them so we can rule out certain things.

flav-code commented 2 weeks ago

I've already double-checked my code When you talk to me about span, I think of HTML elements, but I don't do web.

lforst commented 2 weeks ago

@flav-code I am talking about Sentry spans. Which you would start with Sentry.startSpan(). (https://docs.sentry.io/concepts/key-terms/tracing/distributed-tracing/#traces-transactions-and-spans)

Do you happen to have very long running requests in your process? Like server-sent-events, or streaming?

flav-code commented 2 weeks ago

On my side I never call Sentry.startSpan()

lforst commented 2 weeks ago

@flav-code Do you have any very long running requests / request handlers in your code?

flav-code commented 2 weeks ago

I wouldn't say long requests, but I have a lot of requests per second, over 110req/s at times, and I receive a lot of messages via websocket.

flav-code commented 2 weeks ago

here you can see that the memory leak arrived at the same time as my sentry update, and since then I've had the performance tab working.

Image

Image

Image

lforst commented 2 weeks ago

I believe you saying this is correlated with the update. Can you share a bit more about your program architecture? Is discord opening a websocket request on your server or is your server making a websocket request to discord?

Also, are you initializing Sentry more than once per process?

lforst commented 2 weeks ago

From what I can tell from your profiler screenshots, something is creating a Span (a root span) that seemingly never finishes, and spans keep getting attached to that span and that ends up leaking memory. It would be nice to be able to figure out, what it is that is creating this root span.

flav-code commented 2 weeks ago

I'll try to remove the profillerRate line, and we'll see if it comes from that.

lforst commented 2 weeks ago

If you could somehow also try logging console.log(Sentry.getRootSpan()) somewhere in your websocket code, and share what is logged, that would be cool!

flav-code commented 2 weeks ago

Sentry.getRootSpan() requires and argument

lforst commented 2 weeks ago

Sorry right. Can you try

const s = Sentry.getActiveSpan();
if (s) {
  console.log('root span', Sentry.getRootSpan(s));
}
flav-code commented 2 weeks ago

On the active process i used eval()

NonRecordingSpan {
  _spanContext: {
    traceId: 'ba6488d048422cfba347c2a2b9b1eca5',
    spanId: '5df94fad8fd85299',
    traceFlags: 0,
    traceState: [TraceState]
  }
}
lforst commented 2 weeks ago

Thanks! We are struggling trying to understand what is creating this non-recording Span. Usually that shouldn't happen unless you set a traces sample rate of <1 or you manually continue a trace.

Can you try setting debug: true in your ˚Sentry.init()`? It will probably generate a lot of logs, but it may tell you what exactly is continued or not sampled and so on.

flav-code commented 2 weeks ago

can I enable logs without restarting my process ?

lforst commented 2 weeks ago

I don't think so :/

lforst commented 2 weeks ago

Another thing that we just noticed: Have you properly followed the migration guide for how and when to call Sentry.init()? You may be calling Sentry.init() too late, which may create weird spans. https://docs.sentry.io/platforms/javascript/guides/node/migration/v7-to-v8/#updated-sdk-initialization

flav-code commented 2 weeks ago

in my case I don't require sentry first

lforst commented 2 weeks ago

@flav-code can you try doing so?

flav-code commented 2 weeks ago

Is it normal for it to load integrations that I haven't put in?

Image

lforst commented 2 weeks ago

@flav-code yes, there is a set of integrations that are enabled by default, but don't worry, as long as you do not use (ie import/require) these packages these integrations are no-ops.

flav-code commented 2 weeks ago

okay,

good now ?

Image

NonRecordingSpan {
  _spanContext: {
    traceId: 'd6ce7aec75e1fbdac62a14ca1a5d3353',
    spanId: 'eea933c24e347f1d',
    traceFlags: 0,
    traceState: [TraceState]
  }
}
lforst commented 2 weeks ago

I can't make out from the screenshot what you did tbh. If you are still importing other things inside ./extras/sentry, that's wrong.

flav-code commented 2 weeks ago

This corresponds to sentry

Image

lforst commented 2 weeks ago

That setup looks good 👌 Can you share more of the debug logs? It would be good to see logs up until and including when you do things like send requests and database queries and similar. Thanks!

flav-code commented 2 weeks ago

there was a query to my db but it doesn't seem to appear

Image

lforst commented 2 weeks ago

Would you mind sharing the start of your application up to a certain point in text format? Thanks!

flav-code commented 2 weeks ago

Do you have discord ? It would be easier to talk and send logs

lforst commented 2 weeks ago

Yes! Feel free to join https://discord.com/invite/sentry and ping @lforst

lforst commented 4 days ago

After some back and fourth we have discovered that the memory leak in this issue happens due to the httpIntegration creating a NonRecordingSpan for a very long-lived websocket request. The span is never ended because the request basically never ends for the entire lifetime of the process and stuff (ie. child spans) keep getting attached to that span.

The workaround for now is to do the following:

Sentry.init({
  integrations: [
    Sentry.httpIntegration({
      ignoreOutgoingRequests(url, request) {
        return true;
      },
    }),
  ],
})

Thanks for the collaboration @flav-code!!


Action items (varying degrees of possible):

flav-code commented 2 days ago

No problem 👍