Closed Abdkhan14 closed 8 months ago
https://github.com/getsentry/sentry-python/discussions/2416
also noted here in this discussion
Relevant/might be fixed by: https://github.com/getsentry/sentry-python/pull/2325
TL;DR: My proposed course of action for now:
Some background:
The fundamental problem here is that we're trying to parse a non-regular language with regular expressions in the resolver. Any regex we come up with will never fully "work". There will always be counter examples of perfectly fine URL patterns that will not be parsed correctly.
To illustrate, consider these proposed candidates and URL patterns they won't work for (paste e.g. to https://regex101.com/ to play with these):
r"\(\?P<(\w+)>.*\)"
: This is the currently used regex that matches too much, so e.g. for a URL pattern such as (?P<project>\w+)/product/(?P<pid>\w+)
it greedily matches the whole string instead of two individual groups. This is what leads to us essentially erasing stuff after the first matched group if there's any closing parenthesis anywhere further in the string.r"\(\?P<(\w+)>[^\)]+\)+"
: This is the original regex, which, while working fine for the multiple named groups case, as soon as there's an extra closing parenthesis in an unfortunate place, will match too little: (?P<project>\w+[()]+)/product/(?P<pid>\w+)
(this is a very simplified version of the URL pattern from https://github.com/getsentry/sentry-python/issues/1527).r"\(\?P<(\w+)>.*?\)(?=/|\$|$)"
: This is a proposal that covers both the Django CMS case and the multiple named groups case, but comes with a baked in assumption that any named group will be neatly in its own "container" ending with a slash or end of string. If someone has some static stuff appended to a named group, e.g. (?P<project>\w+)/product/(?P<pid>\d+)p
, the named group will not be captured.While this list is definitely not exhaustive, the point I want to make is that any regex we choose will have to come with some assumptions about what constitutes a "correct URL pattern", simply because regexes inherently can't solve this problem. In other words, any change we will make to the regex will introduce a regression, so we should approach any changes to this very carefully. (Or come up with a new approach to named group matching in the resolver.)
👍 for the proposed course of action.
Would supporting "nice" transaction names for path
s only, while re_path
s would be exposed raw?
@salomvary The problem is that path
s can also contain "hidden" regexes, see e.g. https://github.com/getsentry/sentry-python/issues/2446.
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
Expected Result
Transaction name is
/api/0/organizations/{organization_slug}/events-trace/{trace_id}/
Actual Result
Transaction name is renamed from
/api/0/organizations/{organization_slug}/events-trace/{trace_id}/
to/api/0/organizations/{organization_slug}/
.Product Area
Other
Link
No response
DSN
No response
Version
No response