amazon-archives / aws-flow-ruby

ARCHIVED
137 stars 58 forks source link

decision_context not set during signal-induced DecisionTask #110

Open ben-mays opened 8 years ago

ben-mays commented 8 years ago

Running the sample code given with a reference to the decision_context causes the DecisionTask to fail. The execution history shows DecisionTaskScheduled, DecisionTaskStarted but never DecisionTaskCompleted. Eventually the workflow will timeout. The cause is the decision_context resolving to nil.

Here is the modified code to reproduce:

require_relative '../../recipe_activities'
class WaitForSignalWorkflow
  extend AWS::Flow::Workflows

  workflow :place_order do
    {
      version: "1.0",
      task_list: "wait_for_signal_workflow",
      execution_start_to_close_timeout: 60,
      task_start_to_close_timeout: 20,
    }
  end
  activity_client(:client) { { from_class: "RecipeActivity" } }
  signal :change_order

  def initialize
    @change_order_period = 30
    @signal_received = Future.new
  end

  def place_order(original_amount)
    timer = create_timer_async(@change_order_period)
    wait_for_any(timer, @signal_received)
    client.process(amount)
  end

  def change_order(amount)
    puts workflow_id # raises exception, workflow_id calls decision_context.workflow_context..
    @signal_received.set(amount) unless @signal_received.set?
  end
end
ben-mays commented 8 years ago

Additionally, the workflow executor does not log the failure anywhere and simply blackholes failures in the signal-induced DecisionTasks.

mustafashabib commented 8 years ago

:+1:

runjoerun commented 8 years ago

:pray:

pheuter commented 8 years ago

+1

mjsteger commented 8 years ago

@ben-mays Can you provide the code you are using to run the worker/activity_worker/starter? I was getting a similar issue where I'd get DecisionTaskStarted but never DecisionTaskCompleted, and the workflow would apparently blackhole the error and timeout. Bumping to 3.1.0(the newest release, which for some reason is not in the gemfile for the samples repo) allowed it to properly raise the exception and let me see my error, and after adding a value to start_execution allowed it to go through correctly(I still get an error, but that's due to amount not being defined in the code snippet given)

ben-mays commented 8 years ago

@mjsteger sorry, we're actively moving functionality off of SWF as a result of this and numerous other issues that manifested themselves- long polling causing tasks to be scheduled on dead sockets, the decision/activity context not being set, a memory leak that won't go away. I'll leave the issue open for others that may have the same issue.

jpfuentes2 commented 8 years ago

@ben-mays Do you have any literature you've written about these issues? Did you happen to use the JVM Flow framework as well or are these experiences solely based on the ruby version? Can you speak to what you've switched to (assuming custom-grown workflow management on-top of a message bus)?