getsentry / sentry-python

The official Python SDK for Sentry.io
https://sentry.io/for/python/
MIT License
1.87k stars 487 forks source link

No Quota Separation in Transport Layer #3575

Open narobertson42 opened 2 days ago

narobertson42 commented 2 days ago

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.9.0

Steps to Reproduce

  1. Exhaust performance units quota

Image

  1. Attempt to send performance metrics, and you will receive rate limiting issues:

    [sentry] WARNING: Rate-limited via x-sentry-rate-limits
    [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 1
  2. Trigger an error event, it will incorrectly be impacted by the rate limits

    [sentry] INFO: event processor (<function DedupeIntegration.setup_once.<locals>.processor at 0xffffa2193560>) dropped event

Expected Result

The Sentry SDK needs a more specific approach to handling rate limits for different event types, such that performance metrics are rate limited when the quota is expended, but error events are still sent to Sentry.

Actual Result

KeyError: 'TriggerTestEvent' 
[sentry] INFO: event processor (<function DedupeIntegration.setup_once.<locals>.processor at 0xffffa2193560>) dropped event
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 2
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 3
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 4
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 5
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 6
szokeasaurusrex commented 2 days ago

Hi @narobertson42, have you noticed errors actually missing in Sentry? From the log message you provided, it appears that error events are being dropped due to our deduplication, not because we are incorrectly rate-limiting your error events.

[sentry] INFO: event processor (<function DedupeIntegration.setup_once..processor at 0xffffa2193560>) dropped event

If you are absolutely sure that your error events are actually missing, would you be able to provide a minimal reproduction of the problem?

narobertson42 commented 2 days ago

@szokeasaurusrex

There were no error events appearing in my sentry dashboard until I paid to increase the performance limit, despite having plenty of error limit remaining.

Image

import sentry_sdk
import time
from sentry_sdk import start_transaction

sentry_sdk.init(
    dsn="<your_dsn>",
    traces_sample_rate=1.0,
    profiles_sample_rate=1.0,
    dev=True,
)

def simulate_performance_event():
    with start_transaction(op="task", name="Performance Event"):
        time.sleep(0.5)
        print("[INFO] Performance event recorded.")

def simulate_error_event():
    try:
        raise ValueError("TriggerTestEvent")
    except Exception as e:
        sentry_sdk.capture_exception(e)
        print("[INFO] Error event recorded.")

for _ in range(20):
    simulate_performance_event()

simulate_error_event()
sl0thentr0py commented 2 days ago

This was a decision made by me while building the backpressure solution about the health of the system. This is because relay does not distinguish whether the rate limit is due to quota or spike protection. In the latter case, I wanted backpressure to kick in which is why I made this decision. I will think about changing this to only work with transaction rate limits or make it configurable.

In the long term, it would be great if relay could somehow distinguish the two rate limit cases but I don't think we can count on that anytime soon.