getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.9k stars 4.17k forks source link

Rules for ignoring/archiving issues until conditions are met #76115

Open filips123 opened 2 months ago

filips123 commented 2 months ago

Problem Statement

Sentry already provides support for archiving (previously known as ignoring) issues. However, due to security and compliance reasons, if a specific archived issue hasn't occurred for more than 90 days, it will be completely removed from Sentry, including any archive configuration. Thus, the next time it happens, it will be treated as a new unresolved issue. This means that developers will receive a new unnecessary alert, even though the issue should have been ignored.

In my case, the program regularly accesses a third party API. The API may be down sometimes, which is fine as long as it soon comes back up, but if it remains down for a long time, this is something to investigate and I should receive an alert. To handle this, I have this specific issue(s) archived until it happens N times per hour. However, because this error is quite rare and may not happen in more than 90 days, the old issue is deleted, so the next time it happens, I receive an alert, even though it happened only once. Another slight annoyance is that after I've reviewed the issue, I have to manually configure the old "archive until" rule again.

A similar problem can happen with some specific frontend errors that are known to happen sometimes, but should remain archived until escalating. Here, the problem is again that if the error doesn't happen for 90 days, it will get removed, so it will trigger an alert the next time.

Filtering out the issues in the SDK, for example with before_send callback, is not suitable for this, as then the issues will be completely ignored forever. Additionally, managing the archived issues through the Sentry website has benefits, as it doesn't require changing and rebuilding the application, and I can still see when these archived issues happen on the website.

Solution Brainstorm

My idea is to allow archive rules to be defined manually on the Sentry website using matchers, like for fingerprint and stack trace rules. So that it would be possible to automatically archive all events of specific type/message/fingerprints until the conditions are met. It should be possible to reference existing fingerprint rules, for example to archive all issues with a specific fingerprint. This could also allow more complex conditions for unarchiving than the current available ones, for example multiple conditions joined with logical operators, although this is not necessary for my use case.

Similarly, for developers that would prefer to configure these rules in the code, it should be possible to set archive rules for specific events from the SDK, for example in before_send callback.

Then, when a new issue type occurs, Sentry should check if it has archive rules defined, so if it matches any if the configured rules on the website (like for fingerprint rules), or has the archive rule set directly inside the event (if it was set using the SDK). If both are set, the SDK should probably take priority. Such events should then immediately get archived, without sending any alerts, until the set conditions are met (except if the "condition" is archived forever, in this case, the issue should be well ... archived forever). Other events should be handled normally.

Also, the behavior of the resolve button should be modified to reapply the rules. So, for example, a specific event has some archive rules set (with either of the above methods). At some point, the conditions are met, and the event is unarchived, the alert is triggered, and I fix the problem if necessary. Then, I want to mark the issue as resolved, but still want the original archive rules apply in the future. So the next time the issue happens, it won't be unarchived immediately, but again only after the conditions are met. Alternatively, this could be done as a separate "restore to archive rules" button.

Product Area

Issues

getsantry[bot] commented 2 months ago

Auto-routing to @getsentry/product-owners-issues for triage ⏲️

JoshFerge commented 2 months ago

hi @filips123 thank you for the issue. I'll discuss with the team and determine next steps. thanks!