During my load testing of TaskWorker, I was able to narrow down one source of
instability to missing events during a websocket reconnection. I realized that
we were only partially backfilling events down the socket (limited to 1k) and
that under moderate volume it was totally possible to miss events. Instead,
we will clamp the backfill to 15 minutes worth of time and send all of the
matching events.
During my load testing of
TaskWorker
, I was able to narrow down one source of instability to missing events during a websocket reconnection. I realized that we were only partially backfilling events down the socket (limited to 1k) and that under moderate volume it was totally possible to miss events. Instead, we will clamp the backfill to 15 minutes worth of time and send all of the matching events.Part of #14098