Open tbloncar opened 3 months ago
Assigning to @getsentry/support for routing ⏲️
Routing to @getsentry/product-owners-crons for triage ⏲️
The failure reason of "check-ins detected" is a bug as described in https://github.com/getsentry/sentry/issues/71179
I'm not 100% sure why you would get a timeout line that, but I think some latency may be at play with when the completion check-in was sent.
Thanks for the info, re: "check-ins detected," @evanpurkhiser. Just to confirm my understanding, you're thinking there's some latency on Sentry's side that is preventing the system from recognizing that the check-in occurred prior to the runtime deadline? I've temporarily bumped the max runtime setting on our side to see if this helps.
Yeah. The check-in is definitely completing since it has a reported duration. Usually, the duration would be longer than the timeout in this case, so I suspect somewhere in the system there is some latency with the completion check-in making it's way to sentry.
The general architecture is that
Once the check-in is in the queue, the timestamp of when the check-in reached the queue is where it will live in terms of processing time. The whole system moves at the speed of the queue
It gets difficult to have tight tolerances when the completion check-in get's that close to the timeout time. In general, I would recommend giving a bit of buffer room if you expect it to run for that long.
Let us know if you still see problems even with a bit more of a timeout margin!
Environment
SaaS (https://sentry.io/)
Steps to Reproduce
0,30 * * * *
), has a 4-minute grace period, and has a 1-minute max runtimeExpected Result
Monitor failure is triggered when the task fails to check in within 4 minutes OR runs for longer than 1 minute.
Actual Result
Monitor is seemingly erroneously triggered in one environment, but not the other. Notice that, in both cases, neither of the above conditions are met. However, for the prod environment, we get a failure with a failure reason of "check-ins detected." I've confirmed with CloudWatch logs that the prod task ran for the 49 seconds indicated by the UI here.
Product Area
Crons
Link
No response
DSN
No response
Version
No response