Closed jc3m closed 5 months ago
Hi, thanks for writing in about this. We are currently taking a look what might cause this!
A question and an ask:
sentry.server.config.ts
and the contents of instrumentation.ts
?8.4.0
and 8.5.0
to help us narrow down what changes might have introduced the leak.Also, if you have any custom code around Sentry feel free to share it here!
@lforst if it helps, we have the same issue (also using Next.js) and it was caused by upgrading from 8.4.0 to 8.5.0, so I'm fairly confident the issue lies in here.
We are currently pinned to 8.4.0 which doesn't have this issue.
Also seeing this on a bump from 8.3.0 to 8.5.0 last week.
@AlecRust that helps a lot narrowing it down. Thanks! I'll investigate further.
One more question to the people following this issue. Does anybody experience this with other SDKs than @sentry/nextjs
? So far people exclusively seem to be using that SDK.
Update: I am struggling to find the culprit. If anybody is able to share a memory profile with this happening that would be awesome. Also, any information on what kind of instrumentation/database ORM is used is super useful.
Another note - if you remove tracesSampleRate
/tracesSampler
from your config (to disable performance monitoring) does the memory issues still apply? Curious if this leak is tied to spans/performance/tracing.
We've also heard memory concerns with people using Postgres - is anyone using that?
I am like 90% confident I found the leak and I hope never having to touch a memory profiler in my life again: https://github.com/getsentry/sentry-javascript/pull/12335
Why it leaks, no idea, but it leaks.
I figured out the issue now but it's not entirely clear to me why it was triggered yet. On node calling setTimeout
returns a Timeout
object. That object is tracked in an internal list of timers and that list is maintained in two places. On the one hand in unenroll
which is used by clearTimeout
(and clearInterval
) and one when the timer runs.
However only the unenroll
path also removes a timer from the internal knownTimersById
map. This map is updated whenever the Timeout
is converted into a primitive. From that moment onwards a timer can be cleared by it's internal async id.
So to get a setTimeout
to leak you just need to call +setTimeout(...)
and it wait for it to complete. The entry from the knownTimersById
map is not removed and we leak.
The memory dump that @lforst shared with me indicates that we have timers leaked in knownTimersById
and they all have their Symbol(kHasPrimitive)
flag set to true
. This means something somewhere converts timeouts into primitives. This still happens even after the patch but I don't know where this would happen.
The repro case is trivial:
// leaks
for (i = 0; i < 500000; i++) {
+setTimeout(() => {}, 0);
}
This will create 500000 un-collectable Timeout
s that can be found in the knownTimersById
map in timers.js
. Removing the +
fixes it. There are other situations in which JavaScript will convert something into a primitive. For instance putting the timeout into a map will do that:
> x = {}
{}
> x[setTimeout(() => {}, 0)] = 42;
42
> x
{ '119': 42 }
So there might be some patterns in either our codebase or in opentelemetry that do that.
Independent of that we should open an issue against node as there is clearly a bug there Timer is removed here from the list but not from knownTimersById
: https://github.com/nodejs/node/blob/7d14d1fe068dfb34947eb4d328699680a1f5e75d/lib/internal/timers.js#L544-L545
Compare this to how unenroll
clears: https://github.com/nodejs/node/blob/7d14d1fe068dfb34947eb4d328699680a1f5e75d/lib/timers.js#L86-L93
Always love it when SDK bugs actually reveal bugs with node.js or the browser 😂
We spent some time trying to reproduce why the timeout becomes a primitive.
With just initializing the SDK neither setTimeout.toString()
changes nor does hasPrimitive
become flagged on the timer object. This means in a minimal repro, the SDK does not seem to be causing this behaviour.
Timeout {
_idleTimeout: 1000,
_idlePrev: null,
_idleNext: null,
_idleStart: 80,
_onTimeout: [Function (anonymous)],
_timerArgs: undefined,
_repeat: null,
_destroyed: true,
[Symbol(refed)]: true,
[Symbol(kHasPrimitive)]: false,
[Symbol(asyncId)]: 6,
[Symbol(triggerId)]: 1
}
Next we looked at AsyncLocalStorage
, given both the SDK and Next.js relies on this. This also seems to have no impact.
Timeout {
_idleTimeout: 1000,
_idlePrev: null,
_idleNext: null,
_idleStart: 81,
_onTimeout: [Function (anonymous)],
_timerArgs: undefined,
_repeat: null,
_destroyed: true,
[Symbol(refed)]: true,
[Symbol(kHasPrimitive)]: false,
[Symbol(asyncId)]: 8,
[Symbol(triggerId)]: 1,
[Symbol(kResourceStore)]: 0
}
So this means the problem is with Next.js for sure.
@lforst ran some tests and apparently sometimes Next.js patches setTimeout
to cause this behaviour! This is because they have some resource tracking class that holds references to all timers.
This functionality was introduced in Next to fix another Node.js bug 😓 https://github.com/vercel/next.js/pull/57235
I'll let @lforst say it best
Hey, we've just released 8.8.0 which should hopefully fix this issue! Let us know if you still notice any problems.
Experiencing huge memory leaks on @sentry/node@8.9.2, will downgrade to 8.3.0
@AnthonyDugarte memory leaks are p0 for us to fix, could you open a new GH issue with details about your setup? We'll get someone on that asap!
Also getting memory leaks on 8.28.0.
From my crude analysis, it does appear to be the same Timeout issue:
We experience memory leak in (at least) @sentry/bun@8.34.0.
Also getting memory leaks on 8.28.0.
From my crude analysis, it does appear to be the same Timeout issue:
have you resolved this? I also get this ! :((
Hey, I fix this by use node lts 20.18.0 https://github.com/nodejs/node/pull/53337
I get this error too. My VPS has upto 8/12GB Ram available... Is there another way to upload source map after build?
here is my log.
#13 10.58 ▲ Next.js 14.2.15
#13 10.58 - Environments: .env
#13 10.58 - Experiments (use with caution):
#13 10.58 · instrumentationHook
#13 10.58
#13 10.73 Creating an optimized production build ...
#13 184.5
#13 184.5 <--- Last few GCs --->
#13 184.5
#13 184.5 [42:0x31bde10] 179422 ms: Mark-Compact 2035.1 (2084.6) -> 2031.0 (2086.4) MB, 1058.58 / 0.00 ms (average mu = 0.340, current mu = 0.056) allocation failure; scavenge might not succeed
#13 184.5 [42:0x31bde10] 180641 ms: Mark-Compact 2035.1 (2086.4) -> 2033.1 (2088.1) MB, 1190.60 / 0.00 ms (average mu = 0.210, current mu = 0.023) allocation failure; scavenge might not succeed
#13 184.5
#13 184.5
#13 184.5 <--- JS stacktrace --->
#13 184.5
#13 184.5 FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
#13 184.5 ----- Native stack trace -----
#13 184.5
#13 184.5 1: 0xaaae2f node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
#13 184.5 2: 0xe308c0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
#13 184.5 3: 0xe30ca4 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
#13 184.5 4: 0x10604c7 [node]
#13 184.5 5: 0x1079039 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
#13 184.5 6: 0x1051ca7 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
#13 184.5 7: 0x10528e4 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
#13 184.5 8: 0x1031c0e v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
#13 184.5 9: 0x149b930 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
#13 184.5 10: 0x18ddef6 [node]
#13 185.0 error: script "build" was terminated by signal SIGABRT (Abort)
#13 185.0 ============================================================
#13 185.0 Bun v1.1.20 (ae194892) Linux x64
#13 185.0 Linux Kernel v5.15.0 | glibc v2.39
#13 185.0 CPU: sse42 popcnt avx avx2
#13 185.0 Args: "bun" "run" "build"
#13 185.0 Features: spawn
#13 185.0 Elapsed: 183379ms | User: 14ms | Sys: 13ms
#13 185.0 RSS: 1.07GB | Peak: 14.41MB | Commit: 1.07GB | Faults: 3
#13 185.0
#13 185.0 panic(main thread): Segmentation fault at address 0x0
#13 185.0 oh no: Bun has crashed. This indicates a bug in Bun, not your code.
#13 185.0
#13 185.0 To send a redacted crash report to Bun's team,
#13 185.0 please file a GitHub issue using the link below:
#13 185.0
#13 185.0 https://bun.report/1.1.20/lr1ae19489AggggE+7iQ_________A2AA
#13 185.0
#13 ERROR: process "/bin/bash -ol pipefail -c NODE_OPTIONS=--max-old-space-size=8192 prisma generate && bun run build" did not complete successfully: exit code: 132
------
> [stage-0 9/11] RUN --mount=type=cache,id=oskgwswo404kcw4ggk40g40s-next/cache,target=/app/.next/cache --mount=type=cache,id=oskgwswo404kcw4ggk40g40s-node_modules/cache,target=/app/node_modules/.cache NODE_OPTIONS=--max-old-space-size=8192 prisma generate && bun run build:
185.0 RSS: 1.07GB | Peak: 14.41MB | Commit: 1.07GB | Faults: 3
185.0
185.0 panic(main thread): Segmentation fault at address 0x0
185.0 oh no: Bun has crashed. This indicates a bug in Bun, not your code.
185.0
185.0 To send a redacted crash report to Bun's team,
185.0 please file a GitHub issue using the link below:
185.0
185.0 https://bun.report/1.1.20/lr1ae19489AggggE+7iQ_________A2AA
185.0
------
Dockerfile:24
--------------------
22 | # build phase
23 | COPY . /app/.
24 | >>> RUN --mount=type=cache,id=oskgwswo404kcw4ggk40g40s-next/cache,target=/app/.next/cache --mount=type=cache,id=oskgwswo404kcw4ggk40g40s-node_modules/cache,target=/app/node_modules/.cache NODE_OPTIONS=--max-old-space-size=8192 prisma generate && bun run build
25 |
26 |
--------------------
ERROR: failed to solve: process "/bin/bash -ol pipefail -c NODE_OPTIONS=--max-old-space-size=8192 prisma generate && bun run build" did not complete successfully: exit code: 132
Deployment failed. Removing the new version of your application.
@IRediTOTO this seems like a bug in Bun from looking at the logs. I recommend you follow up with the Bun team to fix this. Switching to Node.js in the mean time will probably unblock you.
@IRediTOTO this seems like a bug in Bun from looking at the logs. I recommend you follow up with the Bun team to fix this. Switching to Node.js in the mean time will probably unblock you.
No, I tried many ways
nextjs.config.mjs
, do you know any other way?
sourcemaps: {
disable: true,
},
If you disable sentry, but enable generating sourcemaps does the error still occur? (basically is the problem with uploading or with actual sourcemap generation)
You can check this with productionBrowserSourceMaps: true
and disabling sentry: https://nextjs.org/docs/app/api-reference/next-config-js/productionBrowserSourceMaps
If the problem exists just with sourcemap generation, this is a next.js problem. It's nextjs/webpack which is generating the sourcemap.
Is there an existing issue for this?
How do you use Sentry?
Sentry Saas (sentry.io)
Which SDK are you using?
@sentry/nextjs
SDK Version
8.7.0
Framework Version
No response
Link to Sentry event
No response
SDK Setup
Steps to Reproduce
Next.js application using
"@sentry/nextjs": "8.7.0",
(here is a full list of dependencies)Service is deployed via AWS ECS + Fargate
We noticed that our first deploy following an upgrade from
8.3.0
to8.6.0
started causing our containers to hit their memory limits and crash + restart. We noticed this behavior happening across two separate Next.js applications / containers that were upgraded to8.6.0
at the same time.Expected Result
Containers stay under memory limit.
Actual Result
Here is a memory usage graph from one of our containers. Version 8.3.0 does not appear to contain an issue, version > 8.6.0 does, we did not check versions 8.3.0 or 8.4.0
Here are some logs observed at time of crash: