getsentry / sentry-native

Sentry SDK for C, C++ and native applications.
MIT License
395 stars 166 forks source link

Stream envelopes to file directly #1001

Closed Swatinem closed 2 months ago

Swatinem commented 4 months ago

The sentry_envelope_write_to_path was always built on top of sentry_envelope_serialize purely for convenience, and has always had a comment suggesting it should stream directly to the output file:

https://github.com/getsentry/sentry-native/blob/e8c7f15bb7ec6fe76dcd68d11f79aec1acdf57c7/src/sentry_envelope.c#L476-L482

One customer is now running into the problem that the SDK writes invalid envelopes to disk on OOM, most likely because it is unable to allocate memory itself.

Such envelopes may have item headers (with correct length), but no item contents, most likely because writing the contents would grow the buffer which fails, so the serialization code then just silently ignores that error and continues writing the next item header.

In another case, the envelope contains attachments, but no event. Most likely because the code to builds envelopes currently reads attachments into memory first, before constructing the event datastructure, which also increases memory pressure.


Likely the most reasonable effort change would be to resolve that TODO which has existed ever since the code was written to truely stream envelope writes to disk, which should halve the memory requirement for the whole operation.

Second, but probably more effort, and also with an observable side-effect would be to load attachments on-demand when serializing, vs when building the envelope in-memory.

supervacuus commented 4 months ago

Second, but probably more effort, and also with an observable side-effect would be to load attachments on-demand when serializing, vs when building the envelope in-memory.

What would be the observable side-effect (you mean to users, right?) besides deferred/lower memory usage? We do not expose envelope items to clients except for events or transactions. It could make a difference if we decide to implement something like this: https://github.com/getsentry/sentry-native/issues/978

Swatinem commented 4 months ago

The "observable" side-effect would be related to the timing of filesystem access. Already we advertise that users should flush any attachments in the on-crash hook, because thats when attachments are being read. If this would happen at some point afterwards, somewhere in the transport code, that could come as a surprise.

supervacuus commented 2 months ago

1021 fixed

Likely the most reasonable effort change would be to resolve that TODO which has existed ever since the code was written to truely stream envelope writes to disk, which should halve the memory requirement for the whole operation.

but we probably should keep

Second, but probably more effort, and also with an observable side-effect would be to load attachments on-demand when serializing, vs when building the envelope in-memory.

on the agenda since attachments typically pale the effects of all other envelope items.