apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.85k stars 14.25k forks source link

Tableau - problem when fetching job status #32799

Open monik-a opened 1 year ago

monik-a commented 1 year ago

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

We’e refreshing Tableau dashboards with Airflow and the below set of actions creates issues:

What you think should happen instead

There should be a retry strategy on poking (wait_for_status), where we at least try to fetch the started job 2 more times before failing the task execution. This is to avoid a resource conflict on Tableau's side, which will happen when we initiate a new refresh while the existing one didn't finish.

How to reproduce

Mock a 502 response from a server when executing poking for status. Please find attached logs form 3 attempts of the same DAG run (failed when really the job on Tableau succeeded). dag_id=daily_refresh_tableau_workbooks_run_id=scheduled__2023-07-06T03_30_00+00_00_task_id=refresh_workbook_main_transactions_dashboard_attempt=3.log dag_id=daily_refresh_tableau_workbooks_run_id=scheduled__2023-07-06T03_30_00+00_00_task_id=refresh_workbook_main_transactions_dashboard_attempt=3 (2).log dag_id=daily_refresh_tableau_workbooks_run_id=scheduled__2023-07-06T03_30_00+00_00_task_id=refresh_workbook_main_transactions_dashboard_attempt=3 (1).log

Operating System

debian booster (docker)

Versions of Apache Airflow Providers

pache-airflow-providers-tableau 2.1.8

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

hussein-awala commented 1 year ago

I believe there isn't a bug in the current behavior. Skipping 5xx errors might lead to the task running indefinitely if there is a persistent issue with the server.

Nevertheless, I think we can enhance the sensor's functionality by introducing a tolerance parameter for handling specific response errors (e.g., 408, 504, etc.). We could consider failing the sensor after a certain number of attempts with the same error, say, five consecutive attempts.

Would you like to implement it and become a contributor to the Apache Airflow project?

okirialbert commented 1 year ago

Hi @hussein-awala, I'd like to work on this issue.

eladkal commented 1 year ago

Hi @hussein-awala, I'd like to work on this issue.

assigned