Netflix / conductor

Conductor is a microservices orchestration engine.
Apache License 2.0
12.81k stars 2.34k forks source link

Scheduled Task initiated with previous failed task's outputData #2452

Open roversnord opened 3 years ago

roversnord commented 3 years ago

Describe the bug Creating a task where it is setup to retry upon failure, the retried task is populated with the previous failed attempt's outputData.

Details Conductor version: 3.2.0 Workflow definition: { "name": "response_test", "version": 1, "tasks": [ { "type": "HTTP", "name": "FUNCTION", "taskReferenceName": "test_task", "inputParameters": { "http_request": { "uri": "http://httpstat.us/500", "method": "GET", "Accept": "application/json" } }, "taskDefinition": { "name": "test_task", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 120, "timeoutPolicy": "RETRY" } } ], "ownerEmail": "string@email.com", "schemaVersion": 2 } Task definition: included in workflow definition Event handler definition:

To Reproduce Steps to reproduce the behavior:

  1. Create workflow curl --location --request POST 'http://localhost:8080/api/metadata/workflow' \ --header 'Content-Type: application/json' \ --data-raw '{ "name": "response_test", "version": 1, "tasks": [ { "type": "HTTP", "name": "FUNCTION", "taskReferenceName": "test_task", "inputParameters": { "http_request": { "uri": "http://httpstat.us/500", "method": "GET", "Accept": "application/json" } }, "taskDefinition": { "name": "test_task", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 120, "timeoutPolicy": "RETRY" } } ], "ownerEmail": "string@email.com", "schemaVersion": 2 }'
  2. Start the workflow curl --location --request POST 'http://localhost:8080/api/workflow/response_test' \ --header 'accept: application/json' \ --header 'Content-Type: application/json' \ --data-raw '{"version": 1}'
  3. Get workflow by Id. Replace workflow id in the curl with output of step#2 curl --location --request GET 'http://localhost:8080/api/workflow/4949a49f-ea2c-4fc2-9c73-331c5ff70196'
  4. Search for "OutputData" in the response from step 3 and look for the second occurrence. Even though the taskId for the second execution is different than the the first and the task is in scheduled state, and the startTime and endTime are 0 indicating the task has executed, the outputData is populated with the response from the first attempt.

Expected behavior When a failed task is retried, the outputData should be empty until the task is actually executed and has outputData.

Screenshots Result Json with explanation:

  1. First task attempt failed, second retry in scheduled state and outputData is populated from the first attempt. workflow_execution_status_2_scheduled.txt

  2. Second retry now in Failed state and outputData now replaced with the actual result of second retry and a 3rd retry in scheduled state with outputData populated from the 2nd failure. workflow_execution_status_2_failed.txt

Additional context Add any other context about the problem here.

roversnord commented 3 years ago

The behavior is same in conductor version 2.30.3 as well. This is the version we use and originally noticed the issue. Tried to verify in latest and see the same behavior in 3.2.0

kishorebanala commented 3 years ago

@roversnord While this is fairly simple to fix, we were thinking if this would break for some users, for eg., if they were using output data from the failed attempt(s).

What are you trying to accomplish?

roversnord commented 3 years ago

When we update task, we don't want to lose the previously populated outputdata for the task, so we append to the outputdata if there's anything there already. What this causes is when a task fails and is retried, the output data keeps getting accumulated with all the previous failed attempts' outputdata. So, if have 5 retries and the first 4 fail and the last attempt succeeds, the outputdata now has the data from all 5 attempts. If there is a need to use output data from the failed attempts, it would be possible to cycle through the failed attempts get that info but if we don't want the outputdata from the failed attempts there isn't a way to do that.

aravindanr commented 3 years ago

If there is a need to use output data from the failed attempts, it would be possible to cycle through the failed attempts get that info but if we don't want the outputdata from the failed attempts there isn't a way to do that.

Workflow does not have access to the outputData of all the attempts at the moment. Only the successful attempt is visible to the following tasks. Clearing the outputData after each attempt could break existing users and also some future use-cases.

We suggest introducing the following expressions

  1. taskRefName.attempts[attemptIndex].output which picks the output of a specific attempt.
  2. taskRefName.failedAttempt.output which picks the output of the last failed attempt.
  3. taskRefName.output, which is already supported and picks the output of the successful attempt.

If you have the time, we welcome a pull request. We can also the brainstorm the approach in a new discussion. For now, i am removing the label, bug.

aravindanr commented 2 years ago

@csharplus this is a good first issue to look into, if you are interested.

csharplus commented 2 years ago

Hi Aravindan,

Sure, I'll take a look at it, and let you know when the fix is ready for review.

Thanks, Henry

On Mon, Jan 10, 2022 at 2:25 PM Aravindan Ramkumar @.***> wrote:

@csharplus https://github.com/csharplus this is a good first issue to look into, if you are interested.

— Reply to this email directly, view it on GitHub https://github.com/Netflix/conductor/issues/2452#issuecomment-1009404358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKC4FNNGJQ6L72UVXU7LULLUVNMEHANCNFSM5DWKOPXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>