PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.95k stars 1.57k forks source link

Flow fails if nested task fails, even if it succeeds on retry #14390

Closed thundercat1 closed 2 months ago

thundercat1 commented 3 months ago

First check

Bug summary

In the following flow, I'd expect the tasks to fail once, but then the caller top_task should retry, and everything should be successful on the second try. However, the overall flow state is marked as Failed despite the tasks eventually succeeding.

Reproduction

from prefect import flow, task

failed = False

@task
def nested_flaky_task():
    # This task will fail the first time it is run, but will succeed if called a second time
    global failed
    if not failed:
        failed = True
        raise ValueError("Forced task failure")

@task(
    retries=1,
)
def top_task():
    nested_flaky_task()

@flow
def nested_task_flow():
    top_task()

if __name__ == "__main__":
    nested_task_flow()

Error

15:47:30.527 | INFO    | Task run 'top_task-0' - Received non-final state 'AwaitingRetry' when proposing final state 'Failed' and will attempt to run again...
15:47:30.572 | INFO    | Task run 'top_task-0' - Created task run 'nested_flaky_task-1' for task 'nested_flaky_task'
15:47:30.573 | INFO    | Task run 'top_task-0' - Executing 'nested_flaky_task-1' immediately...
15:47:30.613 | INFO    | Task run 'nested_flaky_task-1' - Finished in state Completed()
15:47:30.627 | INFO    | Task run 'top_task-0' - Finished in state Completed()
15:47:30.645 | ERROR   | Flow run 'berserk-loon' - Finished in state Failed('1/3 states failed.')

Versions (prefect version output)

Version:             2.19.6
API version:         0.8.4
Python version:      3.11.6
Git commit:          9d938fe7
Built:               Mon, Jun 24, 2024 10:23 AM
OS/Arch:             darwin/arm64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.43.2

Additional context

Workaround here is pretty straightforward - either add retries to the flaky task, or remove the task decorator. So it isn't a huge blocker to being able to build effective flows. But, it's confusing and hard to retroactively figure out what happened when this happens in the context of a big flow - behavior more in line with expectations could prevent some debugging headaches.

WillRaphaelson commented 3 months ago

Thanks @thundercat1 - it does seem that we are not resolving the nested success on retry back up the chain, we can fix this.

serinamarie commented 3 months ago

Hi @thundercat1, I put up a PR that I hope will resolve this issue. Can you let me know if it does?

zhen0 commented 2 months ago

I believe this was closed by https://github.com/PrefectHQ/prefect/pull/14439. Please let us know by opening a new issue if you still seeing any problems.