apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.93k stars 14.26k forks source link

Problem with `retry_from_failure` flag in `DbtCloudRunJobOperator` #43347

Open krzysztof-kubis opened 6 days ago

krzysztof-kubis commented 6 days ago

Apache Airflow Provider(s)

dbt-cloud

Versions of Apache Airflow Providers

Astronomer Runtime 12.1.0 based on Airflow 2.10.1+astro.1 Git Version: .release:7a1ffe6438b5ea8fcf75c4e5a356a6c23ab18404

Apache Airflow version

Airflow 2.10.1+astro.1

Operating System

Debian GNU/Linux 12 (bookworm)

Deployment

Astronomer

Deployment details

pure: https://github.com/dbt-labs/airflow-dbt-cloud

What happened

With the retry_from_failure=True flag set, each run only executes the models that failed in the previous run, which is fine. However, if there is an error in one model that can't be resolved (e.g., due to a data source issue), the flag prevents the other models from being refreshed, even in subsequent scheduled runs.

What you think should happen instead

I think retry_from_failure should only apply to reruns. Two improvements might be thought of.

  1. add a condition that triggers {account_id}/jobs/{job_id}/rerun/ only during task reruns e.g. replacing line 463 in providers/dbt/cloud/hooks/dbt.py from if retry_from_failure: to something like if retry_from_failure and context['task_instance'].try_number!=1:

  2. a more general solution replacing the flag with a parameter with several values ​​e.g. retry_from_failure = ["Never", "Rerun", "Always"]

How to reproduce

You don't need to do anything specific to reproduce the issue. The flag works on every run, but it should likely only affect reruns.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

potiuk commented 6 days ago

Feel free to propose a PR and run it. Assigned you.