getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.18k stars 4.2k forks source link

Loss of events in Web UI. #80549

Open agaibura-tripleten opened 1 month ago

agaibura-tripleten commented 1 month ago

Self-Hosted Version

24.8.0

CPU Architecture

x86_64

Docker Version

25.0.5

Docker Compose Version

2.29.5

Steps to Reproduce

  1. run a Django 3 application with sentry-sdk.
  2. write a middleware to handle any Exception and log every sentry event id: sentry_event_id = sentry_sdk.capture_exception(exc)
  3. fetch a non-nullable sentry_event_id on self-hosted instance, e.g., https:///organizations/sentry/discover/:cf4dc5abe7844cbba4cd5d333c17058c/

Expected Result

I see the event data.

Actual Result

The actual result is "Page Not Found".

What I tried to do:

  1. check memory backpressure. It's only 50% consumption at a peak.
  2. check buffer.envelopes_mem metric. It's less than 200KB at a peak.
  3. check disk io backpressire. We don't utilize even 10% of throughput or iops.
  4. tried to increase number of relay replicas from 1 to 2.

Event ID

No response

agaibura-tripleten commented 1 month ago

One of the things I found today, is that there are events with different sample rates. For example, I see client_sample_rate=0.1 in Trace Details of an error event. But I can't find the nature of this param. I see that any sample_rate is equal to 1.0.

Any ideas that can help me here?

Additionally, I deployed a cron job that creates 30 events and then checks the existence of these events in the self-hosted sentry. This cron job never fails.

bc-sentry commented 4 days ago

Assigning to getsentry/sentry for product area triage.

getsantry[bot] commented 4 days ago

Routing to @getsentry/product-owners-ingestion-and-filtering for triage ⏲️

getsantry[bot] commented 4 days ago

Assigning to @getsentry/support for routing ⏲️

Dav1dde commented 4 days ago

You can confirm in self hosted that events make it to your instance by directly checking what is ingested into Kafka.

Also you can configure a DSN and see if Relay reports any errors, but if there is no error, Relay did not drop any events. Note, these errors are also logged to the console.

Additionally you should be able to confirm in your stats page how many events make it through to Storage.