getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.31k stars 4.22k forks source link

Rate-limited Oban cron monitor check-ins #79484

Open rodolfoBee opened 4 months ago

rodolfoBee commented 4 months ago

Environment

SDK version 10.6.0.

Steps to Reproduce

SDK configuration:

config :sentry,
dsn: System.get_env("SENTRY_DSN"),
environment_name: environment,
enable_source_code_context: true,
root_source_code_paths: [File.cwd!()],
integrations: [
oban: [
cron: [enabled: true]
]

Jobs are configured using crontab, for example:

{"1-59/5 * * * *", Cron.DeleteTeam},
{"2-59/5 * * * *", Cron.GenerateFees},

It was tried to manually create the monitor following the auto-config in the oban integration code with no success.

Expected Result

Check-ins are sent and accepted by Sentry following the job's crontab

Actual Result

Check-ins are marked as "Monitor rate limit":

Image

All monitors are active in Sentry and there is available quota.

whatyouhide commented 4 months ago

I’m not sure what Sentry means by "Dropped (Server)". It doesn't give you any details on why the monitor was dropped?

rodolfoBee commented 4 months ago

The only reason given is "Monitor Rate Limit", note no check-in is accepted so usual rate limits (6 checkins per monitor per minute) do not apply here. How exactly is the check-in envelope created by the oban integration and sent by the SDK?

whatyouhide commented 4 months ago

We create the envelope with something along these lines:

[
  ~s({"event_id":"#{event_id}"}\n),
  ~s({"type": "check_in", "length": #{byte_size(encoded_check_in)}}\n),
  encoded_check_in,
  ?\n
]

You can see this code here. Without any server logs telling us what's wrong this is pretty hard to debug. If you report check-ins manually (with Sentry.capture_check_in/1), does it work?

rodolfoBee commented 4 months ago

@whatyouhide thank you for the info. @gaprl from the crons team is also looking into the backend logs

sl0thentr0py commented 4 months ago

I don't think this is an SDK problem but we can wait for more server investigation before closing.

rodolfoBee commented 2 months ago

@getsentry/product-owners-crons can we get an update on this issue?

sl0thentr0py commented 2 months ago

@rodolfoBee as I said, rate limits and the Dropped (Server) reports are purely server side, I don't think it has to do with the SDK. Did the backend team have an update?

EDIT: Ah sorry now saw that you pinged them and not us. :)

whatyouhide commented 2 months ago

It might be worth closing this particular issue so as to not confuse users into thinking it's an issue with the Elixir SDK?

rodolfoBee commented 2 months ago

Can it be transferred to the getsentry/sentry repo instead, so we can assign to the Crons team?

whatyouhide commented 2 months ago

@rodolfoBee even better yes, but I don't seem to have permissions to do that. @sl0thentr0py?

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-crons for triage ⏲️

getsantry[bot] commented 1 month ago

Assigning to @getsentry/support for routing ⏲️

gaprl commented 2 weeks ago

Hey @rodolfoBee, can you check if any Crons processing errors are shown in the Crons listing page? Usually if it has been dropped by the server, an error message will be shown there. Example:

Image