amazon-archives / aws-flow-ruby

ARCHIVED
137 stars 58 forks source link

Flow framework doesn't handle errors raised within tasks #95

Open kinsersh opened 9 years ago

kinsersh commented 9 years ago

I had a case where an error was raised within my own code that resided within a task, as follows:

future = task do
  raise RuntimeError.new("problem")
end

The flow framework doesn't properly report on the problem. I instead got this:

/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/decider/async_decider.rb:284:in `make_fail_decision': undefined method `reason' for #<RuntimeError:0x007f8bdf770570> (NoMethodError)
    from /Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/decider/async_decider.rb:330:in `complete_workflow'
    from /Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/decider/async_decider.rb:256:in `decide_impl'
    from /Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/decider/async_decider.rb:226:in `decide'
    from /Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/decider/task_handler.rb:56:in `handle_decision_task'
    from /Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/replayer.rb:236:in `replay'

I used the replay mechanism to put a breakpoint inside of async_decider to inspect what the failure really was, which ended up being this:

RuntimeError There was a task attempted to be removed from a BRE, when the BRE did not have that task as an heir

/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/begin_rescue_ensure.rb:129:in `remove'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/tasks.rb:423:in `remove'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/tasks.rb:115:in `cancel'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/begin_rescue_ensure.rb:353:in `cancel'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/begin_rescue_ensure.rb:137:in `block in cancelHeirs'
/Users/kinsersh/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/set.rb:283:in `each_key'
/Users/kinsersh/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/set.rb:283:in `each'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/begin_rescue_ensure.rb:137:in `cancelHeirs'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/begin_rescue_ensure.rb:115:in `fail'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/tasks.rb:418:in `fail'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/tasks.rb:75:in `rescue in block in initialize'
/Users/kinsersh/.rvm/gems/ruby-2.2.0/gems/aws-flow-3.1.0/lib/aws/flow/tasks.rb:77:in `block in initialize'

What I want is for aws-flow-ruby to report the problem I raised, properly handling, rather than experiencing what it looks like is internal problems to aws-flow-ruby with error handling. Also, when looking in the SWF console, I only see decision tasks that time out - not very helpful. If the flow framework isn't changed to handle this better, please at least improve the docs.

I have worked around this by defining a rescue block inside each task execution. I am experiencing this problem with ruby 2.2.0 and aws-flow-ruby v3.1.0.

DMcKinnon-mdsol commented 9 years ago

I'm experiencing this same issue (with the same ruby and flow versions). Is there a fix coming?

barrettford commented 9 years ago

bump...

kinsersh commented 8 years ago

Given the inactivity on this issue and the GitHub repo in general, we will stop using aws-flow-ruby, instead use the Java equivalent.

jcavalieri commented 8 years ago

Seeing this as well. Is this repo even monitored?

jcavalieri commented 8 years ago

@pmohan6 (I'm picking you out because of the number of commits) who is in charge of this repo?

mjsteger commented 8 years ago

(Note that I'm no longer associated with AWS, and not at all in charge of this repo. I am willing to fork if necessary, though)

@kinsersh: Sorry to hear that the library failed you :(. If possible, could you post the workflow history/workflow definition that you used when you got that error?(or even better, a small repro case, though they are admittedly obnoxious to write for this library). It definitely does look like an internal error, and it'd be good to fix that (similarly to @DMcKinnon-mdsol and @jcavalieri).

jcavalieri commented 8 years ago

Hi @mjsteger , thanks for helping out. I actually abandoned using this library, so I don't have any readily available samples.