Netflix / conductor

Conductor is a microservices orchestration engine.
Apache License 2.0
12.82k stars 2.34k forks source link

Task not retried after second attempt when failing task through event queue #3309

Open pmchung opened 2 years ago

pmchung commented 2 years ago

Task is not retried after first attempt when a retry is triggered from event queue when using workflowId and taskRefName.

https://github.com/Netflix/conductor/blob/9e80c4af02c504072dfc775fb97430907bc1097c/core/src/main/java/com/netflix/conductor/core/events/SimpleActionProcessor.java#L130-L146

With multiple created task under a taskRefName due to multiple retry attempts, workflow.getTaskByRefName will get the last created task for the ref name.

I believe the change here https://github.com/Netflix/conductor/pull/2883 introduced some sorting that now selects the first created task which would already be previously failed

manan164 commented 2 years ago

Hi @pmchung , Thanks for reporting. I want to understand the exact issue here. Can you please explain in detail? As per my understanding even when the task is retried the taskRefName won't get changed. However, I do see the behavioral change before and after (https://github.com/Netflix/conductor/pull/2883) so raising a draft (https://github.com/Netflix/conductor/pull/3338) to correct it.