getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.98k stars 1.57k forks source link

`SentrySpanExporter` drops child spans if the root operation takes longer than 5 minutes #12491

Closed jeengbe closed 4 months ago

jeengbe commented 4 months ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

8.9.2

Framework Version

No response

Link to Sentry event

https://prisjakt-ab.sentry.io/performance/trace/962f9a6f5f264f8986da7d9ace318b6f/

SDK Setup

No response

Steps to Reproduce

Due to how https://github.com/getsentry/sentry-javascript/blob/1201eb2d78425709293ee0aac180cbcc1f4e99d2/packages/opentelemetry/src/spanExporter.ts#L53-L58 only flushes spans once a root span has finished, any child spans for an operation that takes longer than 5 minutes (e.g. a cronjob) are dropped.

This is what the trace used to look like: https://prisjakt-ab.sentry.io/performance/trace/890f3ee3731e41119416f3964e90719f, this is now: https://prisjakt-ab.sentry.io/performance/trace/962f9a6f5f264f8986da7d9ace318b6f

I understand that this is to avoid leaking memory (makes sense, I guess), but it would be nice to perhaps configure this limit and disable it entirely. Clearing old spans to avoid memory leak is not really an issue for batch operations that only consist of a single root span and terminate once it's completed anyway.


Furthermore, because of how https://github.com/getsentry/sentry-javascript/blob/1201eb2d78425709293ee0aac180cbcc1f4e99d2/packages/opentelemetry/src/spanExporter.ts#L174-L177 works, any spans longer than 5 minutes are automatically discarded. Fortunately, the root span is still sent because cleanup is never attempted for a finished root span.

Expected Result

a

Actual Result

a

mydea commented 4 months ago

Hey, thank you for the feedback!

You are right that this was done to avoid leaking memory. Your use case is interesting, we should def. support this! We'll introduce a config option for this. I think I'd go with a maxSpanDuration config option which defaults to 5m, which you could overwrite to a value of your choosing, so you could set this to a really high value like 24h if you want. You should never fully disable this because you can always get orphaned spans (for whatever reason, e.g. some parent span is dropped somewhere) which would never be sent/cleaned up, but a high value could be selected nevertheless!

For the reason why also shorter children are dropped: This is because every children waits for it's own parent before it can be sent. So if the parent is not there after 5min, the child is dropped (even if it is completed). This is not ideal and hopefully at some point we can improve this, but this will require us to be able to send spans on their own (not just in a transaction event), which we are working on but cannot do today!

jeengbe commented 4 months ago

Your use case is interesting

Out of interest; how is this a rate/interesting case? Sentry even has a Cron monitoring product, so instrumenting a batch job surely can't be that unusual I'd think.

mydea commented 4 months ago

Sorry, what I meant to say is, your use case makes sense and we should support it properly! It's always a bit of a tradeoff with the cleanup duration, as it depends on your setup how many spans are emitted/orphaned/need to be cleaned up etc 🤔 so we'll need to see what an ideal default value is (maybe it is higher than 5min), but def. a good first step is to make it configurable at all!