getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.07k stars 4.19k forks source link

Ignore core modules' CODEOWNERS in traces #57828

Closed DavidCain closed 1 month ago

DavidCain commented 1 year ago

Problem Statement

Duplicate of https://github.com/getsentry/sentry/issues/54012, which was closed (and my requests to re-open haven't been seen).

Broadly, we'd like to have Sentry issues (whose stacktraces include some core application logic) assigned based on the faulty code, not the core modules involved.

The problem

Certain core modules have a tendency to appear in every stack trace (e.g. a module which provides a decorator to all asynchronous tasks will show up in all async task failures). The CODEOWNERS feature sees these core modules in the stack trace, and assigns all possible errors pertaining to async tasks to the team which maintains the core module. This has the effect of never getting the right team's attention. We'd like to consider these core modules "safe," exactly as if they were third-party modules.

An illustrative example

Consider this simple stacktrace:

Traceback (most recent call last):
  File "example.py", line 3, in <module>
    bad_ideas.main()
  File "src/bad_ideas.py", line 5, in main
    numbers.divide(numerator=10, denominator=0)
  File "src/numbers.py", line 2, in divide
    return numerator / denominator
ZeroDivisionError: division by zero

And couple that trace with a simple .github/CODEOWNERS:

src/bad_ideas.py  @engineering
src/numbers.py    @mathematicians

When this stacktrace becomes a Sentry issue (in a project that uses the CODEOWNERS feature), the @mathematicians team (owners of numbers.py) is assigned the issue. However, the divide method works just fine -- the real bug is in bad_ideas.py (those silly engineers tried to divide by zero!).

With any given stacktrace, it's not possible to know which frame of the trace is responsible for the bug (maybe the owners of example.py should be assigned; who knows). However, we'd really like the ability to consider certain core modules (in this case, numbers.py) to be presumed "safe" and for ownership assignment to be applied based on any other files in the trace.

Solution Brainstorm

In-app frames

Sentry already exposes the ability to mark certain parts of a stack-trace as coming from code that is not in-app: https://docs.sentry.io/product/data-management-settings/event-grouping/stack-trace-rules/

If the CODEOWNERS assignment feature ignored frames which are not "in-app," that would solve this nicely. In other words, it would work perfectly fine if we treated this core modules like third-party dependencies (if numpy is in your stacktrace, it's presumed that numpy probably isn't the source of the problem -- it's the code that calls numpy methods).

Simple CODEOWNERS override

If the Sentry UI exposed a simple set of overrides (in CODEOWNERS syntax), e.g. "Never assign an issue based on these matching globs," that would be work too. We'd like to continue to sync .github/CODEOWNERS for our repositories (and maintain code review membership rules for core modules).

Our current workaround -- remove ownership entirely

The only solution we've found to this is to simply mark core modules as owned by nobody. This has ramifications for assigning code reviewers at PR time, though, so it's not ideal.

Product Area

Issues

getsantry[bot] commented 1 year ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 1 year ago

Routing to @getsentry/product-owners-issues for triage ⏲️

scttcper commented 1 year ago

I've added this to our backlog, I don't see why we can't have some kind of # sentry-ignore syntax or something. Will need to talk to the team

alxckn commented 7 months ago

@DavidCain @scttcper

If the CODEOWNERS assignment feature ignored frames which are not "in-app," that would solve this nicely

According to the ownership rules documentation, this should be the case (Sentry evaluates an event’s in-app frames against the rules in the following order). The change in the documentation pre-dates this issue (cf. https://github.com/getsentry/sentry-docs/pull/7462). Yet it is my experience as well that marking frames as not "in-app" does not seem to exclude the file from the assignment algorithm.

Should this be converted to a bug?

domhauton-miro commented 7 months ago

We are also facing this issue in our monolithic java codebase. Sometimes exceptions are thrown up or downstream of where the problem occurred, resulting in large amounts of Sentry issues going to core teams like the team that maintains our database clients.


If a bad database call is made, the database client will throw an exception back down the stack, but sentry sees the exception as generated in the database client.

Perhaps there could be a way to mark exceptions as "Not because of Team X" when the exception is generated, for marking that the ownership of an exception should be passed back up the stack.


We also see the opposite issue for teams which implement resiliency tools like circuit-breakers. The circuit-breaker may open and start to return exceptions due to an issue ahead, but the owner of the issue should be the team that owns the faulty component ahead.

Perhaps in this case an exception could be marked as "causedBy" another exception and Sentry could understand that the root cause is another exception further down the stack.

MichaelSun48 commented 6 months ago

Hey there everyone, thanks for surfacing this! I was able to reproduce this behavior and after a little digging, I can confirm that this is indeed a bug: If a stacktrace frame is explicitly labeled as not in-app under Project Settings > Issue Grouping > Stack Trace Rules, it should not be considered in our auto-assignment logic. I will re-label this as a Bug instead of a Feature Request and see when we can get this prioritized.

Some notes for whoever may work on geting this fixed in the future:

Repro Steps:

  1. Set up a Sentry project and send an event containing a stack trace (e.g. filename=app/index.js)
  2. Set up an ownership rule for this stacktrace in Project Settings > Ownership Rules (e.g. app/index.js #)
  3. Label the stacktrace as not in-app in Project Settings > Issue Grouping > Stack Trace Rules (e.g. app/index.js -app)
  4. Send the event. Observe that you are still tagged as an assignee despite the frame you own being labeled as not in-app

Please also check out @alxckn's draft PR to confirm if that indeed solves the issue (thank you for taking the time to put that up by the way! 👍)

alxckn commented 3 months ago

Hi there, I reopened a PR to have the CODEOWNERS assignment respect the in_app status of each frames (this time with tests passing). Any chance to get this reviewed please?

At the moment in our organization, auto-assignment using CODEOWNERS feels like the lottery and this is becoming a major pain point.

jangjodi commented 2 months ago

Thank you for opening the PR @alxckn! We'll work on getting it reviewed.

JoshFerge commented 2 months ago

FYI https://github.com/getsentry/sentry/pull/75588 has been merged. thank yo uso much to @alxckn! if folks could verify that this fixes their issue, we can go ahead and close!