Closed kbrock closed 4 months ago
update:
Ran the tests without the changes to verify this.
rspec ./spec/workflow/states/task_spec.rb:238
The WAS
comment shows the values that get this to pass.
The fact that 1.times
works here tells you that this is only run 1 time (and we never retry). This is a result of end? = true
after the first call.
Once that was fixed, we get 3 history records, and a retry count of 3.
describe "Retry" do
let(:workflow) do
make_workflow(
ctx, {
"State" => {
"Type" => "Task",
"Resource" => resource,
"Retry" => [{"ErrorEquals" => ["States.Timeout"], "MaxAttempts" => 2}],
"Next" => "SuccessState"
}.compact,
"SuccessState" => {"Type" => "Succeed"},
}
)
end
context "with specific errors" do
let(:retriers) { [{"ErrorEquals" => ["States.Timeout"], "MaxAttempts" => 2}] }
it "retries if that error is raised" do
# WAS: 1.times
# 1 regular run + 2 retries = 3 times
3.times { expect_run_async(input, :error => "States.Timeout") }
workflow.run_nonblock # WAS: workflow.current_state.run_nonblock
expect(ctx.next_state).to be_nil # WAS: "State"
expect(ctx.state["Retrier"]).to eq(["States.Timeout"])
expect(ctx.state["RetryCount"]).to eq(3) # WAS: 1
expect(ctx.state_history.count).to eq(3) # WAS: 1
expect(ctx.input).to eq(input)
expect(ctx.output).to eq({"Error" => "States.Timeout"}) # WAS nil
expect(ctx.status).to eq("failure")
expect(ctx.ended?).to eq(true)
end
Great find @kbrock
Dependencies
Overview
Running the steps was causing setting
workflow#end? = true
, so in some cases it was not retrying. In other cases it was losing theRetryCount
. I thinkstate_history
was not properly stored. The most obvious issue here is you can notice the number of times we stubrun_async!
has changed. Think this should shine a light on the issues.A few cases I increased the retry_count because that caused the errors to be exposed.
Changes
RetryCount
when there is an error and it ends up on the same state. I felt like there are a few potential edge cases - hence so many checks around propagating those fields.next_state ||=
fixed issues where we deleted anext_state
and skipped the retry.self
after the retry logic helped us avoid settingend? = true
which would havestate.run_nonblock!
working butworkspace.run_nonblock
failing.