Azure / azure-functions-durable-js

JavaScript library for using the Durable Functions bindings
https://www.npmjs.com/package/durable-functions
MIT License
128 stars 46 forks source link

Activity functions are scheduled multiple times in fan-out mode #485

Closed danieltskv closed 1 year ago

danieltskv commented 1 year ago

Describe the bug

Follow up to: #477

I would like an orchestrator function to schedule multiple activity functions for parallel processing (fan-out) where it's valid for some to fail (throw an error), as I will only use the results from the ones that succeeded.

context.df.Task.all throws as soon as one task fails, and there is no equivalent to the JS Promise.allSettled().

I tried using the proposed solution from @GP4cK and @davidmrdavid found here: https://github.com/Azure/azure-functions-durable-js/issues/477#issuecomment-1428870997

try {
  yield context.df.Task.all(tasks);
} catch (err) {
  // One task failed
}

for (let task of tasks) {
  try { // Could yield task throw? Do I need to use try/catch?
    yield task;
    // Do something with task.result
  } catch (err) {
  }
}

The issue I'm noticing is that some activity functions that eventually succeed are being called multiple times. I was assuming the later loop that iterates over the tasks simply continues or fetches the result.

It seems that after the error is thrown and the replay happens on the orchestrator function, activity functions that were scheduled after the failed one (success_2 and success_3 in the code example) are ran again.

Investigative information

To Reproduce

  1. Clone https://github.com/danieltskv/azure-durable-task-all-bug
  2. Follow the steps in the README

Expected behavior

Activity functions that were scheduled with context.df.Task.all should not be called (scheduled) again when later iterating over them with yield task.

Actual behavior

Activity functions are called again.

Sample output

Notice that success_2 and success_3 are called twice. A second time after the error is thrown and we're iterating over the tasks.

Azure Functions Core Tools
Core Tools Version:       4.0.4915 Commit hash: N/A  (64-bit)
Function Runtime Version: 4.14.0.19631

[2023-02-24T16:51:01.649Z] Debugger listening on ws://127.0.0.1:9229/e1b49896-6582-41ad-bf7c-45c4d052d3f2
[2023-02-24T16:51:01.649Z] For help, see: https://nodejs.org/en/docs/inspector

Functions:

        HttpTrigger: [GET,POST] http://localhost:7071/api/HttpTrigger

        ActivityFail: activityTrigger

        ActivitySuccess: activityTrigger

        Orchestrator: orchestrationTrigger

For detailed output, run func with --verbose flag.
[2023-02-24T16:51:02.019Z] Worker process started and initialized.
[2023-02-24T16:51:02.129Z] Debugger attached.
[2023-02-24T16:51:06.469Z] Host lock lease acquired by instance ID '000000000000000000000000BDD02053'.
[2023-02-24T16:51:09.245Z] Executing 'Functions.HttpTrigger' (Reason='This function was programmatically called via the host APIs.', Id=d00253ed-ce3a-454d-8b1c-5b43904d0345)
[2023-02-24T16:51:09.359Z] HTTP trigger function processed a request.
[2023-02-24T16:51:09.555Z] Executing 'Functions.Orchestrator' (Reason='(null)', Id=47387080-03f1-4b49-b8fa-1e39b60eff94)
[2023-02-24T16:51:09.556Z] Executed 'Functions.HttpTrigger' (Succeeded, Id=d00253ed-ce3a-454d-8b1c-5b43904d0345, Duration=335ms)
[2023-02-24T16:51:09.577Z] ----- Running orchestrator. Replaying? false
[2023-02-24T16:51:09.606Z] Executed 'Functions.Orchestrator' (Succeeded, Id=47387080-03f1-4b49-b8fa-1e39b60eff94, Duration=56ms)
[2023-02-24T16:51:09.678Z] Executing 'Functions.ActivitySuccess' (Reason='(null)', Id=75408cac-fcb4-473a-9acf-29bd48fe6806)
[2023-02-24T16:51:09.693Z] ----- Running ActivitySuccess. Input: success_1
[2023-02-24T16:51:09.710Z] Executing 'Functions.ActivityFail' (Reason='(null)', Id=fa3d82ad-7d0f-4e19-804f-b6c174b3f26b)
[2023-02-24T16:51:09.713Z] ----- Running ActivityFail. Input: fail_1
[2023-02-24T16:51:09.753Z] Executing 'Functions.ActivitySuccess' (Reason='(null)', Id=b47c66fb-4672-4012-8d93-cbc33a7c9017)
[2023-02-24T16:51:09.755Z] Executing 'Functions.ActivitySuccess' (Reason='(null)', Id=4008d136-e2d8-49d3-b5bc-196f67f8b3f2)
[2023-02-24T16:51:09.756Z] ----- Running ActivitySuccess. Input: success_2
[2023-02-24T16:51:09.758Z] ----- Running ActivitySuccess. Input: success_3
[2023-02-24T16:51:09.794Z] Executed 'Functions.ActivitySuccess' (Succeeded, Id=75408cac-fcb4-473a-9acf-29bd48fe6806, Duration=117ms)
[2023-02-24T16:51:09.831Z] Executed 'Functions.ActivityFail' (Failed, Id=fa3d82ad-7d0f-4e19-804f-b6c174b3f26b, Duration=119ms)
[2023-02-24T16:51:09.831Z] System.Private.CoreLib: Exception while executing function: Functions.ActivityFail. System.Private.CoreLib: Result: Failure
[2023-02-24T16:51:09.831Z] Exception: Purposefully failed
[2023-02-24T16:51:09.831Z] Stack: Error: Purposefully failed
[2023-02-24T16:51:09.831Z]     ...
[2023-02-24T16:51:09.842Z] 9a2ecdcbaacc4dbb9846ec01c80cd287: Function 'ActivityFail (Activity)' failed with an error. Reason: System.Exception:  Purposefully failed
[2023-02-24T16:51:09.842Z]  ---> Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException: Result: Failure
[2023-02-24T16:51:09.842Z] Exception: Purposefully failed
[2023-02-24T16:51:09.842Z] Stack: Error: Purposefully failed
[2023-02-24T16:51:09.842Z]     ...
[2023-02-24T16:51:09.857Z] Executed 'Functions.ActivitySuccess' (Succeeded, Id=b47c66fb-4672-4012-8d93-cbc33a7c9017, Duration=104ms)
[2023-02-24T16:51:09.859Z] Executing 'Functions.Orchestrator' (Reason='(null)', Id=668ef155-145a-4e3d-96f4-6a4a2dd270ad)
[2023-02-24T16:51:09.861Z] Executed 'Functions.ActivitySuccess' (Succeeded, Id=4008d136-e2d8-49d3-b5bc-196f67f8b3f2, Duration=106ms)
[2023-02-24T16:51:09.869Z] ----- Running orchestrator. Replaying? true
[2023-02-24T16:51:09.876Z] Executed 'Functions.Orchestrator' (Succeeded, Id=668ef155-145a-4e3d-96f4-6a4a2dd270ad, Duration=18ms)
[2023-02-24T16:51:09.952Z] Executing 'Functions.Orchestrator' (Reason='(null)', Id=a028fcaa-ee28-4f42-b377-0fb9a594d771)
[2023-02-24T16:51:09.960Z] ----- Running orchestrator. Replaying? true
[2023-02-24T16:51:09.973Z] Executed 'Functions.Orchestrator' (Succeeded, Id=a028fcaa-ee28-4f42-b377-0fb9a594d771, Duration=20ms)
[2023-02-24T16:51:10.001Z] Executing 'Functions.Orchestrator' (Reason='(null)', Id=1a395ef4-5421-4e2b-9ef9-cd70546f2d37)
[2023-02-24T16:51:10.007Z] ----- Running orchestrator. Replaying? true
[2023-02-24T16:51:10.007Z] [
[2023-02-24T16:51:10.007Z]   { isCompleted: true, isFaulted: false },
[2023-02-24T16:51:10.007Z]   { isCompleted: false, isFaulted: true },
[2023-02-24T16:51:10.007Z]   { isCompleted: true, isFaulted: false },
[2023-02-24T16:51:10.007Z]   { isCompleted: true, isFaulted: false }
[2023-02-24T16:51:10.007Z] ]
[2023-02-24T16:51:10.010Z] Executed 'Functions.Orchestrator' (Succeeded, Id=1a395ef4-5421-4e2b-9ef9-cd70546f2d37, Duration=9ms)
[2023-02-24T16:51:10.047Z] Executing 'Functions.ActivitySuccess' (Reason='(null)', Id=45e5e09e-f2c2-42a6-bb68-3b814524687e)
[2023-02-24T16:51:10.050Z] ----- Running ActivitySuccess. Input: success_2
[2023-02-24T16:51:10.151Z] Executed 'Functions.ActivitySuccess' (Succeeded, Id=45e5e09e-f2c2-42a6-bb68-3b814524687e, Duration=104ms)
[2023-02-24T16:51:10.184Z] Executing 'Functions.Orchestrator' (Reason='(null)', Id=fcc2234c-c9f1-4a06-8f98-d41eb2ad3c7e)
[2023-02-24T16:51:10.188Z] ----- Running orchestrator. Replaying? true
[2023-02-24T16:51:10.190Z] Executed 'Functions.Orchestrator' (Succeeded, Id=fcc2234c-c9f1-4a06-8f98-d41eb2ad3c7e, Duration=5ms)
[2023-02-24T16:51:10.220Z] Executing 'Functions.ActivitySuccess' (Reason='(null)', Id=2f464e3b-4b03-4440-abca-4bfc3cff62c9)
[2023-02-24T16:51:10.225Z] ----- Running ActivitySuccess. Input: success_3
[2023-02-24T16:51:10.324Z] Executed 'Functions.ActivitySuccess' (Succeeded, Id=2f464e3b-4b03-4440-abca-4bfc3cff62c9, Duration=104ms)
[2023-02-24T16:51:10.356Z] Executing 'Functions.Orchestrator' (Reason='(null)', Id=368da1d9-0b1b-4d89-a77e-747f201164c9)
[2023-02-24T16:51:10.362Z] ----- Running orchestrator. Replaying? true
[2023-02-24T16:51:10.368Z] Executed 'Functions.Orchestrator' (Succeeded, Id=368da1d9-0b1b-4d89-a77e-747f201164c9, Duration=12ms)
davidmrdavid commented 1 year ago

Thanks @Danieltskv. @hossam-nasr: could you please take a look at this? I think the issue here is that, on the 2nd yield'ing loop, the activities that were already scheduled are somehow not being flagged as already scheduled, and so we must be creating redundant "action objects".

davidmrdavid commented 1 year ago

Just a friendly ping here, @hossam-nasr

danieltskv commented 1 year ago

Just checking to see if there is any progress? Thanks!

danieltskv commented 1 year ago

Thanks for the fix. Any idea when a new release will be available with the changes?

hossam-nasr commented 1 year ago

Hi @danieltskv I'm planning on releasing a new version of the Durable SDK today, so it should be available later today :)

hossam-nasr commented 1 year ago

Hi @danieltskv versions of Durable Functions SDK with the fix are out now!

v2.x - https://github.com/Azure/azure-functions-durable-js/releases/tag/v2.1.3 v3.x - https://github.com/Azure/azure-functions-durable-js/releases/tag/v3.0.0-alpha.5

Ayzrian commented 6 months ago

Seems that the issues is still reproducable, https://github.com/Azure/azure-functions-durable-js/issues/574