apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.5k stars 14.13k forks source link

ProcessingJobName is not preserved after execution returns from deferred state in SM proceesing job #40432

Closed joch0a closed 2 months ago

joch0a commented 3 months ago

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

We are using 8.23.0 version of airflow provider package and set deferrable to True when using SMProcessingJob.

We set define the ProcessingJobName with something like "project-dag_name-date-uuid", where uuid is to deduplicate job name upon retry. However, when the execution gets back from deferred state, the original uuid is not preserved and there would be a new uuid so the operator cannot find a job with the new job name and it would fail.

There is a closed related issue (https://github.com/apache/airflow/issues/39503) but the fix only applies for Transform jobs.

Apache Airflow version

2.8.1

Operating System

Amazon Linux AMI

Deployment

Amazon (AWS) MWAA

Deployment details

No response

What happened

No response

What you think should happen instead

No response

How to reproduce

For a given SMProcessingJob 1) Set "ProcessingJobName": f"{name}-{str(uuid4())[:8]}". 2) Set deferrable = True 3) The run the DAG.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 3 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

vincbeck commented 2 months ago

Fixed as part of #40706