Feature Request: Per event user-consent.

stima commented 11 months ago

Description

Move consent from "global state" to "per event state".

When does the problem happen

[ ] During build
[X] During run-time
[ ] When capturing a hard crash

Environment

OS: Unrelated
Compiler: Unrelated
CMake version and config: Unrelated

Steps To Reproduce

Turn consent off.
Generate crash.
Turn consent on.
Previous crash would be send, even conset was not given for that particular case.

Log output

supervacuus commented 10 months ago

Thanks, @stima. I am unsure if I would classify this as a bug since we have no cross-SDK rule about user consent. In general, most users will expect that this is a global setting, so applying this to each event will always happen in the context of a global setting to make sure we don't retroactively give consent.

supervacuus commented 10 months ago

In general, most users will expect that this is a global setting, so applying this to each event will always happen in the context of a global setting to make sure we don't retroactively give consent.

By the way, if you mean for users to give consent per event (and not only prevent retroactive sending), that would be yet another design since it would require that we provide an API for the crash context (to allow the user to make the decision based on the contents of what would be sent). This is different from what the user-consent functionality is currently supposed to implement.

supervacuus commented 10 months ago

cc: @kahest

kahest commented 10 months ago

We're talking about 2 different things here:

Consent per event is a new feature
The steps to reproduce describe something that sounds like a bug to me - storing consent per event is a fix for it, but there's likely less involved fixes

About the second point: an event captured while consent was not given should not be sent, even if consent is given afterwards and the event is still present. This seems like an edge case though. @supervacuus wdyt?

stima commented 10 months ago

About the second point: an event captured while consent was not given should not be sent, even if consent is given afterwards and the event is still present

@kahest yes you are totally right, that is mostly about that global consent can not be used when event already generated

kahest commented 10 months ago

@stima cool, thanks for confirming - I think this is unintended and we'll discuss how we'll proceed with this. We won't implement consent per event if not necessary - if you have other use cases for that, feel free to add here

supervacuus commented 10 months ago

About the second point: an event captured while consent was not given should not be sent, even if consent is given afterwards and the event is still present. This seems like an edge case though. @supervacuus wdyt?

This is an edge case because it is not stably reproducible using an integration test (that follows the steps to reproduce). Right now, there are two places where consent is checked:

in the capture envelope path (which hits twice: once for every envelope the client captures and once during the processing of previous crashes in the initialization)
and for crashpad in the upload thread of the crashpad_handler

First, let's consider the capture envelope path: this is responsible for sending any envelopes, including regular events, crashes, and transactions. There is no way for this path where you could give consent, and it would retroactively allow the sending of envelopes where previously consent was revoked because the check happens synchronously.

Of course, suppose you "give" consent on one thread and send an envelope on another. In that case, it is possible that global consent GIVEN is visible before the envelope is passed to the transport, although it was REVOKED when capturing the envelope was initiated. However, this is different from the issue description.

If we need to prevent this, we must track consent per event or block/ignore consent changes during any envelope capture. I wouldn't consider this a bug because at this point it is also unclear to any user at which boundary they have given consent or not. Of course we can move the boundary further out, but it is essentially the application that must make sure that consent is given at a point in time where the user knows which envelopes their consent applies to. Of course, we can talk about this scenario, but consider using a mutex to prevent concurrent consent-mutation and envelope-sending, then we still cannot guarantee which thread acquires the mutex first, leaving the correct order again up to the application.

Which brings us to the path that the client cannot control: crashes will be sent during sentry_init() of the subsequent application run and here, the consent check would happen before any consent changes from the client because you cannot give consent before sentry_init() is finished, and that means the old crash envelopes will be passed to the transport before the client can modify the consent state (i.e., in-flight crash envelopes are unaffected by consent mutation). This is the approach used by the breakpad and inproc backends since those send crashes via our transport.

With the upload thread in the crashpad_handler on the other hand we have no control over when that thread will send remaining crashes. The backend callback for consent in the Native SDK writes directly to the crashpad database and if that write is visible to the thread in the crashpad_handler then it will upload the crash even if the crash happened while consent was revoked.

@stima, do you use the crashpad backend and think you hit this last scenario? Can you also tell me on which platform you see this behavior?

stima commented 10 months ago

Platfrom: Windows Backend: Crashpad

@supervacuus I guess the second scenario describes my case. Usage is quite trivial:

sentry_init(options);
if (AppSettings()->AllowSend())
{
    sentry_consent_give();
}
else
{
    sentry_consent_revoke();
}

e.g. consent may be given faster then crashpad uploading thread started.

Nevetheless, there is another "edge case" related directly to Sentry Native (not tested but this is according to code):

Set consent required and off
Run and generate crash
On second run set consent as not required.
Crash will be sent

bool
sentry__should_skip_upload(void)
{
    bool skip = true;
    SENTRY_WITH_OPTIONS (options) {
        skip = options->require_user_consent // here
            && sentry__atomic_fetch((long *)&options->user_consent)
                != SENTRY_USER_CONSENT_GIVEN;
    }
    return skip;
}

I would like to mention that for some usecases this is quite critical (goverment, banks, etc) to provide confidence that nothing would be send from user PC if user directly disallowed it. I would also like to say, that I think that trivial guarantees for "per-session" consent on startup is enough (e.g. nothing related to MT during run and chaning it)

supervacuus commented 10 months ago

e.g. consent may be given faster then crashpad uploading thread started.

Yeah, as I wrote above, that is totally possible; the crashpad_handler acts fully independent from the Native SDK after the process is started. I could imagine that we have some wiggle room in the initialization by pruning reports if consent was revoked, but we must check whether this interferes with other use cases.

Nevetheless, there is another "edge case" related directly to Sentry Native (not tested but this is according to code):

From the perspective of the current implementation, this is a hypothetical use case since the options typically do not change once the development reaches a stable configuration. Do you need to change the user-consent requirement in your application when deployed to production?

I would like to mention that for some usecases this is quite critical (goverment, banks, etc) to provide confidence that nothing would be send from user PC if user directly disallowed it.

I am aware, and I agree that it is crucial for some scenarios (I would also say that there are many more than tighter user consent bounds for many uses of the Native SDK). Please don't read this as pushing back on the topic. But this must be balanced between the consequences for the 99-percentile clients and providing change for specific applications. There are many ways in which we can help:

first, by identifying if a behavior is an actual bug (like a data race or other potential corruption of data or incorrect implementation of a given spec or even a crash) or
seemingly inconsistent behavior that still follows a sensible technological reason (which would require better documentation)
providing you with alternatives in using the Native SDK. For instance, it could make sense for you to switch to the breakpad backend, which does not have the problem of considering a non-synchronized process during initialization and allows you to control the transport of crashes via a custom transport (which is involved but provides you the freedom to tune to your needs, for instance here: https://github.com/getsentry/sentry-native/issues/932)
and only lastly, by changing the behavior for everyone (and making sure that this doesn't regress the usability for most clients, especially when considering interactions with other APIs)

Please consider my responses in this context and, as a result, the question of how we can prioritize our actions with respect to other needs.

I would also like to say, that I think that trivial guarantees for "per-session" consent on startup is enough (e.g. nothing related to MT during run and chaning it)

I will have a look if we can adapt the initialization and pruning of crashpad records in a way that would eliminate any uploads that the crashpad_handler could pick up. Otherwise, I would recommend having a look at whether the breakpad backend covers your other requirements since there this should work out of the box.

kahest commented 10 months ago

For reference: per event consent has previously been discussed and is tracked here https://github.com/getsentry/sentry-native/issues/110

Created issue to add docs for the feature: https://github.com/getsentry/sentry-docs/issues/8848

supervacuus commented 10 months ago

I guess the second scenario describes my case. Usage is quite trivial:
sentry_init(options);
if (AppSettings()->AllowSend())
{
    sentry_consent_give();
}
else
{
    sentry_consent_revoke();
}
e.g. consent may be given faster then crashpad uploading thread started.

I cannot reproduce the issue following the above implementation on any supported platforms since the crashpad_handler marks each event as completed when consent is revoked (independent of network state). With the next start, no crash event would even be considered by the UploadThread, and as a result, no consent given could retroactively provoke an event being sent, which was created while consent was revoked.

This can only happen when the crashpad_handler is killed exactly after the dump was produced, and during the next start, consent is given and synchronized with the crashpad_handler before the UploadThread runs. This can be considered by the method I described above.

If you have a repro for this case where the crashpad_handler isn't killed, I would be happy to get my hands on it, because that could reveal a bug in the crashpad_handler.

stima commented 10 months ago

I can't share the code, but I will try to investigate myself. Probably unrelated, but there is a topic https://forum.sentry.io/t/sentry-crashpad-crashing/9007/8, that definitely leads to incorrect issues grouping on our board. Appreciate for any advices how to handle it.

getsentry / sentry-native

Feature Request: Per event user-consent. #933