Open evanpurkhiser opened 1 year ago
Routing to @getsentry/crons for triage, due by (sfo). ⏲️
How would the bucketing work in a world where these are events? Since events are immutable, we can't easily merge them.
Or would we just treat these extra checkins as error
checkins?
When these are events it is not as critical that we are recording so many and we can just record them with an "extra_checkin" status and compute the count of extraneous ones with snuba, right?
It's not as critical, but there's still a cost to storing/querying these. Since we're not charging by volume, it might make sense to still have some basic rate limiting in place.
This is also related https://github.com/getsentry/sentry/issues/44390
Routing to @getsentry/product-owners-crons for triage ⏲️
There is a scenario in the current implementation of Crons monitors where we probably should not be letting the user know "everything is okay"
Imagine a crontab monitor configured as
0 * * * *
. This cron monitor should run on the hour every hour.What is happening here?
12:00
. This is on time and successful1:00
. This is on time and successful.2:00
.Current world
Right now record these as simply "Okay". The monitor is successful and in the UI you can see all these additional checkins.
Proposed world
If I have a monitor that runs once an hour, it may be crucial for me that it ONLY runs once per hour. This scenario is an error.
We should warn the user that they are sending unexpected checkins.
Technical implementation proposal
Instead of recording each subsequent checkin that is before the next expected checkin, we can bucket these with the most recent checkin and simply record that they happened. We can also update the status of the monitor to be
extranious_checkins
or something like that.This additionally helps us reduce the number of rows within the
monitorcheckin
table, which is something we are currently keeping an eye on, since we are currently receiving many extraneous checkins.