getsentry / sentry-native

Sentry SDK for C, C++ and native applications.
MIT License
403 stars 170 forks source link

Allow capturing minidumps programatically #1050

Closed bruno-garcia closed 3 weeks ago

bruno-garcia commented 1 month ago

A customer who has their own crash handler mechanism would like to upload minidumps they capture by themselves. One of the solutions proposed was using HTTP and uploading them to our minidump endpoint.

Since that means having to deal with retries, managing files on disk etc. I suggesting using one of our SDKs to manage that.

Our envelope format allows attachment, including type minidump. Which should trigger symbolication.

While we might not have a capture_minidump yet, the suggestion is to create this function so the SDK takes over the job of establishing the connection to Sentry. And deleting the file upon completion.

supervacuus commented 1 month ago

Our envelope format allows attachment, including type minidump. Which should trigger symbolication.

While we might not have a capture_minidump yet, the suggestion is to create this function so the SDK takes over the job of establishing the connection to Sentry. And deleting the file upon completion.

This is what we do in the breakpad backend:

https://github.com/getsentry/sentry-native/blob/05af04ba1da1e16815cd64a5d42992554d28957a/src/backends/sentry_backend_breakpad.cpp#L122-L155

A sensible implementation would take an event (which could be empty, filled, or a sentry_value_new_null()) and a minidump path string and capture_envelope it with the options->transport instead of using a disk transport.

kahest commented 1 month ago

@bruno-garcia you mention

Since that means having to deal with retries, managing files on disk etc.

sentry-native currently doesn't do this, and there's no synchronous behaviour between client and transport. so if that's the expectation it would require additional effort than what is suggested here

PlasmaDev5 commented 1 month ago

Initial Notes

Overview

We have been required to allow for the sending of minidumps created by 3rd parties that can be synchronized to our systems via the SDK. This would not only unblock the customer in question but also be appealing to other engine studios that run there own crash handlers.

Plan

The plan is to provide a new method called sentry__create_minidump that takes in a path to the dump file. This function would then send the minidump to our services using a similar approach to that found in the breakpad backend. For the implementation of this feature i will abstract this functionality out of the breakpad backend into its own independent function. I can then refactor breakpad to make use of this new function to prevent repeated code.

Open Questions

Do we only want to handle minidumps?

Is this request/do we desire to only add support for custom minidumps or is there any additional functionality that makes sense to be sent alongside the minidump.

How much of the breakpad implementation should we break out of this?

Initial thought included breaking out all behavior tied to handling the minidump but noticed a lot of platform specific data that would make for a bad and unclear API. My second thought was to minimize it down to the only the most important aspect for example the code extract as follows

        if (should_handle) {
            sentry_envelope_t *envelope = sentry__prepare_event(
                options, event, nullptr, !options->on_crash_func);
            sentry_session_t *session = sentry__end_current_session_with_status(
                SENTRY_SESSION_STATUS_CRASHED);
            sentry__envelope_add_session(envelope, session);

            // the minidump is added as an attachment,
            // with type `event.minidump`
            sentry_envelope_item_t *item = sentry__envelope_add_from_path(
                envelope, dump_path, "attachment");
            if (item) {
                sentry__envelope_item_set_header(item, "attachment_type",
                    sentry_value_new_string("event.minidump"));

                sentry__envelope_item_set_header(item, "filename",
#ifdef SENTRY_PLATFORM_WINDOWS
                    sentry__value_new_string_from_wstr(
#else
                    sentry_value_new_string(
#endif
                        sentry__path_filename(dump_path)));
            }

            // capture the envelope with the disk transport
            sentry_transport_t *disk_transport
                = sentry_new_disk_transport(options->run);
            sentry__capture_envelope(disk_transport, envelope);
            sentry__transport_dump_queue(disk_transport, options->run);
            sentry_transport_free(disk_transport);

Whilst i feel this approach would lead to a more simplified API where we only need the options and dump path regardless of platform. The concern is it leave a lot of work open to the developer. Potentially there is a middle ground where we find a way to take out the platform specifics or pass the platform data via void* or platform data struct that is implemented per platform

List of platform specific data used in existing breakpad code

not all data listed here is actively used but listed as they are part of the existing function signatures

Windows:

Linux:

Mac:

vaind commented 1 month ago

@bruno-garcia you mention

Since that means having to deal with retries, managing files on disk etc.

sentry-native currently doesn't do this, and there's no synchronous behaviour between client and transport. so if that's the expectation it would require additional effort than what is suggested here

I don't think we need that - IMO, it would be fine to just process all minidumps at once, which moves them over to the envelope cache dir and transport handles retries. Also, we can probably set any large timeout to flush() so it runs until the server accepts everything.

vaind commented 1 month ago

A customer who has their own crash handler mechanism would like to upload minidumps they capture by themselves.

do they want to deal with sentry-native API, building an executable to run this? IDK but build a simple script that runs HTTP requests to our backend sounds a lot simpler.

PlasmaDev5 commented 1 month ago

do they want to deal with sentry-native API, building an executable to run this? IDK but build a simple script that runs HTTP requests to our backend sounds a lot simpler.

From my understanding they are already making use of sentry-native for the use in question this would be a more direct way of achieving there goal. I imagine @bruno-garcia can give more insight on this

bruno-garcia commented 1 month ago

do they want to deal with sentry-native API, building an executable to run this? IDK but build a simple script that runs HTTP requests to our backend sounds a lot simpler.

From my understanding they are already making use of sentry-native for the use in question this would be a more direct way of achieving there goal. I imagine @bruno-garcia can give more insight on this

That's right, they already have sentry-native on the app. And my understanding is that in the future they might switch to our own backends and drop their custom minidump collecting logic.

supervacuus commented 1 month ago

@bruno-garcia you mention

Since that means having to deal with retries, managing files on disk etc.

sentry-native currently doesn't do this, and there's no synchronous behaviour between client and transport. so if that's the expectation it would require additional effort than what is suggested here

I don't think we need that - IMO, it would be fine to just process all minidumps at once, which moves them over to the envelope cache dir and transport handles retries. Also, we can probably set any large timeout to flush() so it runs until the server accepts everything.

To be clear, because you mention "transport handles retries" again in your response, this is currently not happening. The transport only handles rate-limiting returns (and delays according to responses from the server) but not intermittent network downtime (or any other network failure). There is no exponential backoff or similar strategy in place.

So, while removing source minidumps concerning transferring their data into an envelope attachment is safe, once the sender thread picks up the envelope, it is gone.

Flushing is only related to the topic as it persists the inflight items of the send queue. Still, the sender thread is not transactional in any way (i.e., if it fails to send beyond 429, Retry-After, x-sentry-rate-limits, it will dequeue all remaining items in the queue, which might equally fail to send).

bruno-garcia commented 1 month ago

Understood.

Retries I meant is more about offline mode. Since we don't have retries built-in as in "Failed to send to Sentry, lets retry" as this can result in weird DDoS scenarios, and clients stuck for some reason retrying forever if they have some connectivity issue. So seems like we need to at least keep the minidump on disk if the device is offline. But since this is mainly used on desktop/server, this is very much a nice to have feature.

In conclusion, we can forget about 'retries'. Rate limiting is the value one gets from having to try to write this by themselves.