Azure / azure-functions-durable-python

Python library for using the Durable Functions bindings.
MIT License
135 stars 54 forks source link

Fan-out fan-in task_all exception thrown workaround throwing errors #428

Closed IDrumsey closed 5 months ago

IDrumsey commented 1 year ago

πŸ› Describe the bug

@davidmrdavid

Originally asked this question over here, but moved to this repo as requested.

When using the workaround solution suggested here using Python, one of the following two errors is thrown. The second one seems to always be thrown when running the function without debugging, whereas when debugging it seems semi-random which error gets thrown (Both have been thrown with the same code being ran).

Orchestrator function 'DurableFunctionsOrchestrator1' failed: 'AtomicTask' object has no attribute 'append'
Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID 0 and name 'Hello1' (version ''), but the current replay execution hasn't (yet?) scheduled this task. Was a change made to the orchestrator code after this instance had already started running?

The purpose of the workaround is to continue waiting for all tasks to complete after task_all even if any of the tasks throw an exception.

The workaround I linked is using Node.js whereas I'm using Python. I tested it in Node.js and it worked, but it does not work using Python (instead it throws the error seen above).

πŸ€” Expected behavior If an exception is thrown from one of the tasks sent to task_all(), the orchestrator function should not fail, but just keep waiting until all the tasks are complete.

In the zipped code I'm attaching, there are 3 activity functions getting ran. Only the second one throws an exception. The orchestrator should not return status Failed when the 2nd activity function throws an exception, but rather should wait for the rest of the tasks to complete and return the results of the 2 successful activity functions when all tasks have completed (failed or not).

β˜• Steps to reproduce

  1. Download the zipped folder.
  2. You'll have to recreate the virtual environment
  3. Run the orchestrator function

What Durable Functions patterns are you using, if any?

Fan-out fan-in

Any minimal reproducer we can use?

Here's the zipped code you can use to view the issue. You'll have to attach a storage account. bug.zip

Are you running this locally or on Azure?

Locally

nytian commented 1 year ago

Hi, I am working on this reproduction and will reach back to you soon.

dhurley-suncor commented 1 year ago

Any update on this? Also running into a scenario where if one task fails during a fan out I want to still yield the results of the successful tasks. Thanks

dhurley-suncor commented 1 year ago

I have tried the following and while the task_all() does not error out the orchestrator now I am still not seeing the results of the successful tasks.

tasks = []
for file in files:
    tasks.append(context.call_activity("myfunc", file))
try:
    results = yield context.task_all(tasks)
except Exception as e:
    # one or many tasks might fail but keep going
    logging.info(e)

# iterate through TaskSet [List] and get results of succeeded
results = []
for task in tasks:
    output = task.result
    results.append(output)

print(results)

Noticed that the result attribute in the docs says "Get the result of the task, if completed. Otherwise None." and I am seeing all None returned in task.result but not sure how this can be since the context.task_all() should wait for all tasks, successful or not - or does the first failed task move it along but jobs are still running in background?

dhurley-suncor commented 1 year ago

@nytian any update on this? Thanks!

davidmrdavid commented 1 year ago

@dhurley-suncor: do you have a minimal repro you can .zip for me to try this on my machine?

dhurley-suncor commented 1 year ago

@dhurley-suncor: do you have a minimal repro you can .zip for me to try this on my machine?

Here you go. Just as a recap - expectation I thought would be that if one task fails during task_all() fan out/in the orchestration could still continue. For example, case when we have many documents and are passing to a endpoint to extract text, if we can't do one we don't want to stop the orchestration but will still want the fan in to complete. Thanks!

task_all_bug.zip

dhurley-suncor commented 1 year ago

@davidmrdavid any update?

nytian commented 8 months ago

Reproduced the issue at my end. All the activity functions got executed. The issue is that the Exception is not JSON serializable. So any return type that includes an Exception object will cause the error I will open a PR soon to fix this.

lilyjma commented 7 months ago

Hi @dhurley-suncor, @IDrumsey - thank you for using Durable Functions! I'm a PM working on DF and would love to learn about your experience using the product. You can share your feedback in this quick survey to help influence what the team works on next. If you're building intelligent apps, there's also an opportunity to participate in a compensated UX study. Thanks!

nytian commented 5 months ago

Close the issue as the error handling has been updated in PR #493