Fix when resque failure backend is already multiple

sj26 commented 8 months ago

Goal

We had an outage today after recently introducing Bugsnag into our codebase because our Resque failure backend was already Resque::Failure::Multiple, which this code looks like it handles, except for a small mistake with the operator:

Resque::Failure::Multiple < Resque::Failure::Multiple
# => false

Resque::Failure::Multiple <= Resque::Failure::Multiple
# => true

The backend is likely to be the Resque::Failure::Multiple class, not a sub-class.

This meant that when the bugsnag instrumentation code ran and we ended up with:

Resque::Failure.backend
# => Resque::Failure::Multiple
Resque::Failure::Multiple.classes
# => [Resque::Failure::Redis, Bugsnag::Resque, Resque::Failure::Multiple]

So a failure was reported to Redis, then Bugsnag, then Redis, then Bugsnag, then Redis, then Bugsnag, and so on, until we got a "stack overflow" error.

For us, we had a worker which needed cleanup during resque boot, which involves reporting a failure, so none of our resque workers would boot or process work, resulting in an outage of all background queue processing.

Testing

The tests here are pretty literal and testing the implementation more than the outcome. It might be possible to refactor them to test the Resque side of things a little more, but that's a little more than I'd like to chew off in this PR.

We are currently using BUGSNAG_DISABLE_AUTOCONFIGURE to work around, and would love to get this change merged and released quickly so we can use Bugsnag error reporting for Resque :pray:

clr182 commented 8 months ago

Hi @sj26

Thank you for providing this PR. I've added this to our backlog and will review this when priorities allow.

sj26 commented 7 months ago

This bit us again today.

Could you please take a look at this. It's quite a small fix, and shouldn't take much of your time.

mclack commented 7 months ago

Hi @sj26

Just a heads up that this is on our list to look at. I still can't give an ETA on when it'll be fully assessed and tested, but we'll make sure to provide any updates on this thread.

Thanks for your patience in the meantime.

bugsnag / bugsnag-ruby

Fix when resque failure backend is already multiple #803

Goal

Testing