Add Longer Time Intervals in Metric Alert

pushlean commented 9 months ago

Problem Statement

sentry-metric-alert-time-interval-dropdown

The maximum time interval is 24 hours, but it would be helpful to evaluate the alert over a longer time interval such as 7, 14, or 30 days so that the alert isn't sensitive to spikes, especially after releases.

The intention of the alert in question is to measure quota usage by counting events that match on a custom tag, so detecting new issues after a release is out of scope and it's acceptable for the alert to be less sensitive to releases.

Solution Brainstorm

No response

Product Area

Alerts

┆Issue is synchronized with this Jira Improvement by Unito

getsantry[bot] commented 9 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 9 months ago

Routing to @getsentry/product-owners-alerts for triage ⏲️

isabellaenriquez commented 9 months ago

Thank you for the request! I'll add this to our backlog, but it's unlikely we'll be working on anything major related to metric alerts anytime soon.

Note for the future if/when this gets picked up: for longer windows like this, we can run the query less frequently than those for the current windows to counter the expense. Discuss with the streaming team.

realkosty commented 9 months ago

Adding some context and clarification:

@gauthamcs this is related to the call you participated in.

@pushlean mentioned in a different channel:

teams rely on these alerts to see if their team is overusing quota, but quota is monthly and metric alerts have a max time range of 1 day, so teams are getting alerted when their daily quota usage is too high but their monthly usage is under their limit.

@gauthamcs 👆this sounds similar to https://getsentry.atlassian.net/browse/FEEDBACK-1095 ?

This is a quota use rate monitoring use case, not a normal Alerts use case. However: it makes most sense to implement it as described by @pushlean because of the filtering by tag requirement which a prospective quota-focused solution may not satisfy (unless 1) it allows filtering by tag OR 2) we introduce a hypothetical server-side way to route events to different projects based on tags/abs_path similar to ownership rules - a long shot).

The reason why filtering by tag is a requirement is that their use case involves a very large application that multiple dev teams contribute too. Because it is a single codebase and runtime they chose to use 1 project (per each language/platform). However due to the number of developers and teams involved it is not practical to manage, or in this case - just monitor, spend at the level of the entire application.

While it is possible to break up the project into multiple ones on the client-side in the Javascript part (either using custom code or official Micro-frontends solution). The same however can't be done for Python and native parts of the application:

Python: https://github.com/getsentry/sentry-python/issues/2012
Native: can't know which component/team to tag an event with in beforeSend because symbolication needs to happen on server-side first.

So instead it was chosen to set a team/component tag instead of routing to different projects or DSNs and hence the filtering by tag requirement.

getsentry / sentry