Closed jongbeau closed 10 years ago
I've also been in touch with AWS Support:
Amazon Web Services Dec 04, 2013 01:58 AM PST Hi, I've tested your application in my environment and the fails come in when two activities fails at the same time.
I've escalated the issue to the SWF team.
Once I hear about from the team, I will get back in touch with you.
Best regards,
Javier R. Amazon Web Services
Looking into it now. Thanks for the clear repro!
I have a fix which solves the immediate problem(i.e. I was able to run the repro you gave 40x without any timeouts), we'll run it through testing/code review and try to get it out as soon as possible.
Thank you sir! Please let me know when the fix is available for use.
so please forgive my ignorance, but how should I get the latest update? I'm using the aws-flow gem, do I need to build it from source, or should I update the gem?
If you are installing gems directly, you can issue a
gem update aws-flow
to get the latest version, which includes the changes which fixed this issue.
Thanks, I pulled the latest version. It seems to be fixed in my bare bones test, but not in my real application. The main difference is that in my test application, I'm calling child workflows instead of activities. When multiple child workflows fail, I'm getting ChildWorkflowExecutionFailed status. Here is a stack trace:
--- !ruby/exception:StandardError message: failing execution due to failed BCP activity --- - ff_workflow.rb:58:in `block (2 levels) in ff_workflow' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/workflow_definition.rb:50:in `block (2 levels) in execute' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:393:in `block in handle_workflow_execution_started' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:389:in `handle_workflow_execution_started' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:655:in `process_event' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:253:in `block in decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `each' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:227:in `decide' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_handler.rb:47:in `handle_decision_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_poller.rb:65:in `poll_and_process_single_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:199:in `run_once' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:185:in `block in start' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `loop' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `start' - ff_workflow.rb:76:in `<main>' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:389:in `new' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:389:in `handle_workflow_execution_started' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:655:in `process_event' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:253:in `block in decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `each' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:227:in `decide' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_handler.rb:47:in `handle_decision_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_poller.rb:65:in `poll_and_process_single_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:199:in `run_once' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:185:in `block in start' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `loop' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `start' - ff_workflow.rb:76:in `<main>' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:393:in `block in handle_workflow_execution_started' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:389:in `handle_workflow_execution_started' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:655:in `process_event' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:253:in `block in decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `each' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:227:in `decide' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_handler.rb:47:in `handle_decision_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_poller.rb:65:in `poll_and_process_single_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:199:in `run_once' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:185:in `block in start' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `loop' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `start' - ff_workflow.rb:76:in `<main>' - '------ continuation ------' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:389:in `new' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:389:in `handle_workflow_execution_started' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:655:in `process_event' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:253:in `block in decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `each' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:249:in `decide_impl' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/async_decider.rb:227:in `decide' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_handler.rb:47:in `handle_decision_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/task_poller.rb:65:in `poll_and_process_single_task' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:199:in `run_once' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:185:in `block in start' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `loop' - /home/jason/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-1.0.6/lib/aws/decider/worker.rb:184:in `start' - ff_workflow.rb:76:in `<main>'
Can we reopen this issue?
I'm not sure exactly what the problem is, can you explain further? Is the workflow hitting ChildWorkflowExecutionFailed and then timing out in the same way that CompleteWorkflowExecutionFailed did?
Ping. If I can get the workflow definition and sample history, I can work on writing up a repro.
I'm going to close this issue to clean up, but feel free to re-open. To help you further I'll need a workflow definition and sample history.
Please see the link for a small repro of the issue. When running this test, the execution is timing out because the decider is hitting "CompleteWorkflowExecutionFailed" after the activity task fails. See screenshot to see what I mean. This does not necessarily happen every time, but it happens quite often (more than 50% of the time).
I've had lots of trouble handling failed activities, without breaking the decider or getting it into a bad state.
http://www.fileswap.com/dl/PchKdHpSZB/