chaps-io / gush

Fast and distributed workflow runner using ActiveJob and Redis
MIT License
1.03k stars 103 forks source link

Error handling #18

Closed ferusinfo closed 6 years ago

ferusinfo commented 8 years ago

How the Gush is handling (or not) errors that might occur in a Job class? I've tried to use raise, but all of the tasks has passed without an error - nothing has showed up either in Sidekiq or Workflow (the flow.status is returning :running)

Is there any way to integrate Gush with Sidekiq worker handling? Or it is too complicated?

pokonski commented 8 years ago

Right now Gush captures all exceptions to internally mark jobs as failed without raising them further to Sidekiq. So as far as Sidekiq is concerned all jobs succeed. This turned out to be rather annoying for developers.

This will change in the 1.0.0 version I plan on releasing Soon(TM).

ferusinfo commented 8 years ago

What is the estimated ETA for the 1.0.0 version? Happy to cooperate, too. Also, the problem that I've found so far is that even after the flow.reload the failed workflow is still returning :running on the flow.status method.

pokonski commented 8 years ago

If everything goes right, this week :)

Hm, it should not be marked as returning. Can you create a separate issue for this?

ferusinfo commented 8 years ago

Sure, let me do this in a second.

pokonski commented 8 years ago

I pushed a change which raises errors after marking jobs as failed, so Sidekiq can retry them. If you can have a look, that'd be perfect :)

carlthuringer commented 7 years ago

Even though the job raises an error, the configuration says retries: false. This will cause sidekiq to discard the job immediately.

https://github.com/mperham/sidekiq/wiki/Error-Handling#configuration

For jobs that fail due to transient issues, like being unable to obtain a database connection, this causes the workflow to stall in an unrecoverable way. It's not possible to reload and continue the workflow., or I haven't figured out how to do it correclty.

pokonski commented 7 years ago

@carlthuringer what kind of exception are you getting? Gush catches those itself before Sidekiq does and allows retries for the users of Gush (either via CLI or the web gui)

carlthuringer commented 7 years ago

I think I determined my issue. I was expecting Workflow#reload to actually replace/rebuild the instance, but instead it just returns the loaded instance, so to get a proper status update and continue, you have to flow = flow.reload; flow.continue.

bolshakov commented 7 years ago

@pokonsky in case of unavailability of external service I'd prefer to reschedule work later, rather than retry it immediately. Sidekiq has exelant tools addressing such issues - exponential delay before each retry, and callback called after all retries exhausted.

It would be great to support these features in gush.

pokonski commented 7 years ago

@bolshakov agreed, I'm actually considering letting them fail instead so sidekiq can handle that.

pokonski commented 6 years ago

Version 1.0.0 re-raises the error so it can be retried by backend of your chosing now. Closing :)

jalada commented 6 years ago

@pokonski how would you customise the retry behaviour for Sidekiq + Gush jobs?

jalada commented 6 years ago

We solved this by injecting a rescue_from into the Gush::Worker base class in an initializer:

# config/initializers/gush.rb
Gush::Worker.class_eval do
  rescue_from(StandardError) do |e|
    # Any handling you want to do e.g. report to Sentry/Rollbar/etc
  end
end

This stops Sidekiq from retrying jobs in our Gush workflows after the workflows are marked as failed.