Closed Tsquare closed 9 years ago
We definitely need to improve logging here. This error usually indicates that the state machine that was supposed to consume a symbol doesn't exist anymore.
Did you see this with the latest version of ruby flow?
We have seen this in a few cases - 1) Do you have a multi threaded model by any chance? Ruby flow is not tested to be thread safe. 2) Do you use timers? We recently fixed a bug in our Timer state machine that was causing this error. 3) Related to 2, do you schedule more than 100 decisions at once? In such a case, ruby flow will break the decisions in chunks of 99 + 1 (timer) and schedule them separately.
Can you share with us a small repro if you notice this with the latest version of ruby flow?
I haven't been able to try with the 2.0.0 gem version because we're on Ruby 2.0, but as soon as your fix is in I'll give it a shot. I can't get it to reproduce consistently, unfortunately. To your questions:
1) No multithreading 2) No timers in this particular flow (it does use lots of child workflows, and it looks like this error happens when a ChildWorkflowExecutionCompleted event occurs) 3) Not in this particular flow
Can you give me a history of one of the workflows that timed out because of the error? I can try to run the history locally against the decider and get a repro.
What's the best way to get that to you? I can give you our account id & workflow id by email, for example.
How large is the history? Can you just paste it here? I only need the sequence of events.
Pasting in. The last event that times out generates the NoMethodError in the log.
Thu Aug 21 21:52:48 GMT-700 2014 1 WorkflowExecutionStarted Thu Aug 21 21:52:48 GMT-700 2014 2 DecisionTaskScheduled Thu Aug 21 21:52:48 GMT-700 2014 3 DecisionTaskStarted Thu Aug 21 21:52:50 GMT-700 2014 4 DecisionTaskCompleted Thu Aug 21 21:52:50 GMT-700 2014 5 ActivityTaskScheduled Thu Aug 21 21:52:50 GMT-700 2014 6 ActivityTaskScheduled Thu Aug 21 21:52:50 GMT-700 2014 7 ActivityTaskScheduled Thu Aug 21 21:52:50 GMT-700 2014 8 ActivityTaskScheduled Thu Aug 21 21:52:50 GMT-700 2014 9 ActivityTaskStarted Thu Aug 21 21:52:50 GMT-700 2014 10 ActivityTaskStarted Thu Aug 21 21:52:50 GMT-700 2014 11 ActivityTaskStarted Thu Aug 21 21:52:50 GMT-700 2014 12 ActivityTaskStarted Thu Aug 21 22:00:56 GMT-700 2014 13 ActivityTaskCompleted Thu Aug 21 22:00:56 GMT-700 2014 14 DecisionTaskScheduled Thu Aug 21 22:00:57 GMT-700 2014 15 DecisionTaskStarted Thu Aug 21 22:00:58 GMT-700 2014 16 DecisionTaskCompleted Thu Aug 21 22:01:02 GMT-700 2014 17 ActivityTaskCompleted Thu Aug 21 22:01:02 GMT-700 2014 18 DecisionTaskScheduled Thu Aug 21 22:01:02 GMT-700 2014 19 DecisionTaskStarted Thu Aug 21 22:01:03 GMT-700 2014 20 DecisionTaskCompleted Thu Aug 21 22:01:24 GMT-700 2014 21 ActivityTaskCompleted Thu Aug 21 22:01:24 GMT-700 2014 22 DecisionTaskScheduled Thu Aug 21 22:01:24 GMT-700 2014 23 DecisionTaskStarted Thu Aug 21 22:01:25 GMT-700 2014 24 DecisionTaskCompleted Thu Aug 21 22:01:31 GMT-700 2014 25 ActivityTaskCompleted Thu Aug 21 22:01:31 GMT-700 2014 26 DecisionTaskScheduled Thu Aug 21 22:01:31 GMT-700 2014 27 DecisionTaskStarted Thu Aug 21 22:01:32 GMT-700 2014 28 DecisionTaskCompleted Thu Aug 21 22:01:32 GMT-700 2014 29 ActivityTaskScheduled Thu Aug 21 22:01:32 GMT-700 2014 30 ActivityTaskStarted Thu Aug 21 22:01:35 GMT-700 2014 31 ActivityTaskCompleted Thu Aug 21 22:01:35 GMT-700 2014 32 DecisionTaskScheduled Thu Aug 21 22:01:35 GMT-700 2014 33 DecisionTaskStarted Thu Aug 21 22:01:37 GMT-700 2014 34 DecisionTaskCompleted Thu Aug 21 22:01:37 GMT-700 2014 35 ActivityTaskScheduled Thu Aug 21 22:01:37 GMT-700 2014 36 ActivityTaskStarted Thu Aug 21 22:01:40 GMT-700 2014 37 ActivityTaskCompleted Thu Aug 21 22:01:40 GMT-700 2014 38 DecisionTaskScheduled Thu Aug 21 22:01:40 GMT-700 2014 39 DecisionTaskStarted Thu Aug 21 22:01:42 GMT-700 2014 40 DecisionTaskCompleted Thu Aug 21 22:01:42 GMT-700 2014 41 ActivityTaskScheduled Thu Aug 21 22:01:43 GMT-700 2014 42 ActivityTaskStarted Thu Aug 21 22:01:45 GMT-700 2014 43 ActivityTaskCompleted Thu Aug 21 22:01:45 GMT-700 2014 44 DecisionTaskScheduled Thu Aug 21 22:01:45 GMT-700 2014 45 DecisionTaskStarted Thu Aug 21 22:01:48 GMT-700 2014 46 DecisionTaskCompleted Thu Aug 21 22:01:48 GMT-700 2014 47 ActivityTaskScheduled Thu Aug 21 22:01:48 GMT-700 2014 48 ActivityTaskStarted Thu Aug 21 22:01:51 GMT-700 2014 49 ActivityTaskCompleted Thu Aug 21 22:01:51 GMT-700 2014 50 DecisionTaskScheduled Thu Aug 21 22:01:51 GMT-700 2014 51 DecisionTaskStarted Thu Aug 21 22:01:54 GMT-700 2014 52 DecisionTaskCompleted Thu Aug 21 22:01:54 GMT-700 2014 53 ActivityTaskScheduled Thu Aug 21 22:01:54 GMT-700 2014 54 ActivityTaskStarted Thu Aug 21 22:01:57 GMT-700 2014 55 ActivityTaskCompleted Thu Aug 21 22:01:57 GMT-700 2014 56 DecisionTaskScheduled Thu Aug 21 22:01:57 GMT-700 2014 57 DecisionTaskStarted Thu Aug 21 22:02:00 GMT-700 2014 58 DecisionTaskCompleted Thu Aug 21 22:02:00 GMT-700 2014 59 ActivityTaskScheduled Thu Aug 21 22:02:00 GMT-700 2014 60 ActivityTaskStarted Thu Aug 21 22:02:03 GMT-700 2014 61 ActivityTaskCompleted Thu Aug 21 22:02:03 GMT-700 2014 62 DecisionTaskScheduled Thu Aug 21 22:02:03 GMT-700 2014 63 DecisionTaskStarted Thu Aug 21 22:02:08 GMT-700 2014 64 DecisionTaskCompleted Thu Aug 21 22:02:08 GMT-700 2014 65 ActivityTaskScheduled Thu Aug 21 22:02:08 GMT-700 2014 66 ActivityTaskScheduled Thu Aug 21 22:02:08 GMT-700 2014 67 ActivityTaskScheduled Thu Aug 21 22:02:08 GMT-700 2014 68 ActivityTaskScheduled Thu Aug 21 22:02:08 GMT-700 2014 69 ActivityTaskScheduled Thu Aug 21 22:02:08 GMT-700 2014 70 ActivityTaskScheduled Thu Aug 21 22:02:08 GMT-700 2014 71 ActivityTaskStarted Thu Aug 21 22:02:08 GMT-700 2014 72 ActivityTaskStarted Thu Aug 21 22:02:08 GMT-700 2014 73 ActivityTaskStarted Thu Aug 21 22:02:08 GMT-700 2014 74 ActivityTaskStarted Thu Aug 21 22:02:08 GMT-700 2014 75 ActivityTaskStarted Thu Aug 21 22:02:08 GMT-700 2014 76 ActivityTaskStarted Thu Aug 21 22:02:13 GMT-700 2014 77 ActivityTaskCompleted Thu Aug 21 22:02:13 GMT-700 2014 78 DecisionTaskScheduled Thu Aug 21 22:02:13 GMT-700 2014 79 DecisionTaskStarted Thu Aug 21 22:02:18 GMT-700 2014 80 DecisionTaskCompleted Thu Aug 21 22:02:44 GMT-700 2014 81 ActivityTaskCompleted Thu Aug 21 22:02:44 GMT-700 2014 82 DecisionTaskScheduled Thu Aug 21 22:02:44 GMT-700 2014 83 DecisionTaskStarted Thu Aug 21 22:02:49 GMT-700 2014 84 DecisionTaskCompleted Thu Aug 21 22:03:06 GMT-700 2014 85 ActivityTaskCompleted Thu Aug 21 22:03:06 GMT-700 2014 86 DecisionTaskScheduled Thu Aug 21 22:03:06 GMT-700 2014 87 DecisionTaskStarted Thu Aug 21 22:03:07 GMT-700 2014 88 ActivityTaskCompleted Thu Aug 21 22:03:07 GMT-700 2014 89 DecisionTaskScheduled Thu Aug 21 22:03:11 GMT-700 2014 90 DecisionTaskCompleted Thu Aug 21 22:03:11 GMT-700 2014 91 DecisionTaskStarted Thu Aug 21 22:03:16 GMT-700 2014 92 DecisionTaskCompleted Thu Aug 21 22:03:32 GMT-700 2014 93 ActivityTaskCompleted Thu Aug 21 22:03:32 GMT-700 2014 94 DecisionTaskScheduled Thu Aug 21 22:03:32 GMT-700 2014 95 DecisionTaskStarted Thu Aug 21 22:03:37 GMT-700 2014 96 DecisionTaskCompleted Thu Aug 21 22:05:53 GMT-700 2014 97 ActivityTaskCompleted Thu Aug 21 22:05:53 GMT-700 2014 98 DecisionTaskScheduled Thu Aug 21 22:05:53 GMT-700 2014 99 DecisionTaskStarted Thu Aug 21 22:05:59 GMT-700 2014 100 DecisionTaskCompleted Thu Aug 21 22:05:59 GMT-700 2014 101 ActivityTaskScheduled Thu Aug 21 22:05:59 GMT-700 2014 102 ActivityTaskStarted Thu Aug 21 22:08:17 GMT-700 2014 103 ActivityTaskCompleted Thu Aug 21 22:08:17 GMT-700 2014 104 DecisionTaskScheduled Thu Aug 21 22:08:17 GMT-700 2014 105 DecisionTaskStarted Thu Aug 21 22:08:23 GMT-700 2014 106 DecisionTaskCompleted Thu Aug 21 22:08:23 GMT-700 2014 107 ActivityTaskScheduled Thu Aug 21 22:08:23 GMT-700 2014 108 ActivityTaskStarted Thu Aug 21 23:04:13 GMT-700 2014 109 ActivityTaskCompleted Thu Aug 21 23:04:13 GMT-700 2014 110 DecisionTaskScheduled Thu Aug 21 23:04:13 GMT-700 2014 111 DecisionTaskStarted Thu Aug 21 23:04:21 GMT-700 2014 112 DecisionTaskCompleted Thu Aug 21 23:04:21 GMT-700 2014 113 StartChildWorkflowExecutionInitiated Thu Aug 21 23:04:21 GMT-700 2014 114 StartChildWorkflowExecutionInitiated Thu Aug 21 23:04:21 GMT-700 2014 115 StartChildWorkflowExecutionInitiated Thu Aug 21 23:04:21 GMT-700 2014 116 ChildWorkflowExecutionStarted Thu Aug 21 23:04:21 GMT-700 2014 117 DecisionTaskScheduled Thu Aug 21 23:04:21 GMT-700 2014 118 ChildWorkflowExecutionStarted Thu Aug 21 23:04:21 GMT-700 2014 119 DecisionTaskStarted Thu Aug 21 23:04:21 GMT-700 2014 120 ChildWorkflowExecutionStarted Thu Aug 21 23:04:21 GMT-700 2014 121 DecisionTaskScheduled Thu Aug 21 23:04:28 GMT-700 2014 122 DecisionTaskCompleted Thu Aug 21 23:04:28 GMT-700 2014 123 DecisionTaskStarted Thu Aug 21 23:04:29 GMT-700 2014 124 ChildWorkflowExecutionCompleted Thu Aug 21 23:04:29 GMT-700 2014 125 DecisionTaskScheduled Thu Aug 21 23:04:35 GMT-700 2014 126 DecisionTaskCompleted Thu Aug 21 23:04:35 GMT-700 2014 127 DecisionTaskStarted Thu Aug 21 23:04:41 GMT-700 2014 128 DecisionTaskCompleted Thu Aug 21 23:05:04 GMT-700 2014 129 ChildWorkflowExecutionCompleted Thu Aug 21 23:05:04 GMT-700 2014 130 DecisionTaskScheduled Thu Aug 21 23:05:04 GMT-700 2014 131 DecisionTaskStarted Thu Aug 21 23:05:11 GMT-700 2014 132 DecisionTaskCompleted Thu Aug 21 23:05:48 GMT-700 2014 133 ChildWorkflowExecutionCompleted Thu Aug 21 23:05:48 GMT-700 2014 134 DecisionTaskScheduled Thu Aug 21 23:05:48 GMT-700 2014 135 DecisionTaskStarted Thu Aug 21 23:05:55 GMT-700 2014 136 DecisionTaskCompleted Thu Aug 21 23:05:55 GMT-700 2014 137 StartChildWorkflowExecutionInitiated Thu Aug 21 23:05:55 GMT-700 2014 138 StartChildWorkflowExecutionInitiated Thu Aug 21 23:05:55 GMT-700 2014 139 ChildWorkflowExecutionStarted Thu Aug 21 23:05:55 GMT-700 2014 140 DecisionTaskScheduled Thu Aug 21 23:05:55 GMT-700 2014 141 ChildWorkflowExecutionStarted Thu Aug 21 23:05:55 GMT-700 2014 142 DecisionTaskStarted Thu Aug 21 23:06:02 GMT-700 2014 143 DecisionTaskCompleted Thu Aug 21 23:13:56 GMT-700 2014 144 ChildWorkflowExecutionCompleted Thu Aug 21 23:13:56 GMT-700 2014 145 DecisionTaskScheduled Thu Aug 21 23:13:56 GMT-700 2014 146 DecisionTaskStarted Thu Aug 21 23:14:05 GMT-700 2014 147 DecisionTaskCompleted Thu Aug 21 23:37:56 GMT-700 2014 148 ChildWorkflowExecutionCompleted Thu Aug 21 23:37:56 GMT-700 2014 149 DecisionTaskScheduled Thu Aug 21 23:37:56 GMT-700 2014 150 DecisionTaskStarted Thu Aug 21 23:38:04 GMT-700 2014 151 DecisionTaskCompleted Thu Aug 21 23:38:04 GMT-700 2014 152 StartChildWorkflowExecutionInitiated Thu Aug 21 23:38:04 GMT-700 2014 153 StartChildWorkflowExecutionInitiated Thu Aug 21 23:38:04 GMT-700 2014 154 StartChildWorkflowExecutionInitiated Thu Aug 21 23:38:04 GMT-700 2014 155 ChildWorkflowExecutionStarted Thu Aug 21 23:38:04 GMT-700 2014 156 DecisionTaskScheduled Thu Aug 21 23:38:04 GMT-700 2014 157 ChildWorkflowExecutionStarted Thu Aug 21 23:38:04 GMT-700 2014 158 ChildWorkflowExecutionStarted Thu Aug 21 23:38:04 GMT-700 2014 159 DecisionTaskStarted Thu Aug 21 23:38:14 GMT-700 2014 160 DecisionTaskCompleted Thu Aug 21 23:41:04 GMT-700 2014 161 ChildWorkflowExecutionCompleted Thu Aug 21 23:41:04 GMT-700 2014 162 DecisionTaskScheduled Thu Aug 21 23:41:04 GMT-700 2014 163 DecisionTaskStarted Thu Aug 21 23:41:13 GMT-700 2014 164 DecisionTaskCompleted Thu Aug 21 23:41:16 GMT-700 2014 165 ChildWorkflowExecutionCompleted Thu Aug 21 23:41:16 GMT-700 2014 166 DecisionTaskScheduled Thu Aug 21 23:41:16 GMT-700 2014 167 DecisionTaskStarted Thu Aug 21 23:41:23 GMT-700 2014 168 DecisionTaskCompleted Fri Aug 22 01:00:48 GMT-700 2014 169 ChildWorkflowExecutionCompleted Fri Aug 22 01:00:48 GMT-700 2014 170 DecisionTaskScheduled Fri Aug 22 01:00:49 GMT-700 2014 171 DecisionTaskStarted Fri Aug 22 01:10:49 GMT-700 2014 172 DecisionTaskTimedOut
actually, it might be a better idea to send the workflow details to me. Can you give me the account id, swf endpoint, domain, workflow id and run id via AWS forums? You can send me a private message using this link - https://forums.aws.amazon.com/pmpost!default.jspa. My forum handle is paritoshataws Also, let me know if you encountered this error in aws-flow-2.0.1.
When I try, I get:
Error sending message - the recipient is not allowed to receive private messages.
Just saw the issue in aws-flow-2.0.1, sorry to say
Just wanted to let you know that I'm working on resolving the forum issue so that you can send me the workflow details.
Can you try sending the details in a private message to 'asadaws' on this link - https://forums.aws.amazon.com/pmpost!default.jspa
same error
Sorry about the delay. Can you post a new thread on the AWS SimpleWorkflow forums here - https://forums.aws.amazon.com/forum.jspa?forumID=133. That way our cloud support can look up your account. You can just provide a link on the forum post to this github page.
I haven't been able to repro using the workflow code and history that you sent me. Will it be possible for you to write a minimal repro? Btw, do any of your child workflows/activities return more than 32k characters of data?
Unfortunately I can't repro reilably. The answer to your 2nd question is no.
I'm working on improving logging. Hopefully that will help us narrow down the issue.
Just a random thought, but do you have non deterministic code in your workflow? Workflow code should be deterministic and should always produce the same result given a particular history. Non-determinism could cause these no method errors.
Don't see anything from a quick scan (nothing dependent on time/date, or anything else that could change between/while workflow runs)
We recently released a WorkflowReplayer for ruby flow. The Replayer can be used to 'replay' the history of a particular workflow execution in the decider. You can start a debugger and step through it and see where the issue lies. Can you try replaying your workflow history? You can see a code snippet on how to use the Replayer here - https://github.com/aws/aws-flow-ruby/pull/72
Unfortunately, we are not able to reproduce this issue ourselves. It's possible that there is some state machine bug or some weird race condition but a minimal repro is needed to isolate the issue. Were you able to use the replayer to isolate it? Do you continue to see this issue?
sorry, have not been able to repro with replayer. the plan is to work around this issue by changing our flow
I'm resolving this issue for now because we haven't been able to repro it and don't have a path forward. Feel free to reopen if you have any new findings.
I'm seeing intermittent errors like this in the AWS-Flow-WorkflowTaskPoller log file: I, [2014-07-08T21:43:39.541507 #6052] INFO -- : Got decision task ... E, [2014-07-08T21:43:44.207649 #6052] ERROR -- : Error in the poller, NoMethodError, undefined method `consume' for nil:NilClass
For the same flow with the same parameters, etc, we sometimes get this error with the result that decision tasks time out and the flow never completes, and sometimes everything completes normally.