dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.18k stars 1.4k forks source link

Consider more re-execution strategies in Run Retry #8590

Open yuhan opened 2 years ago

yuhan commented 2 years ago

Use Case

Run Retry supports two options: "all steps" and "from failure". Consider an option to control exactly where a retry should start from. something like +failure.

Status-quo:

@job(tags={"dagster/max_retries": 3, "dagster/retry_strategy": "ALL_STEPS"})
def other_sample_sample_job():
    pass

Maybe:

@job(tags={"dagster/max_retries": 3, "dagster/retry_strategy": "+FAILED_STEP*"})
def other_sample_sample_job():
    pass

It could be tricky to design as we need to have a way to say which step has failed, and there could be multiple failed steps.

Ideas of Implementation

thread the reexecution selection into here

Additional Info


Message from the maintainers:

Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.

Auric-Manteo commented 12 months ago

A more generic solution could be a dynamic error handler / hook that returns the step(s) to rerun. You could provide parameters that are interesting as a basis to decide which step(s) to start with and about each step's state (failed/succeeded/not run), as well as the current number of retries.