apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.86k stars 14.25k forks source link

Can't cancel EMR Serverless task #31099

Closed dacort closed 1 year ago

dacort commented 1 year ago

Apache Airflow version

2.6.0

What happened

When marking an EMR Serverless job as failed, the job continues to run.

What you think should happen instead

The job should be cancelled. Looking at the EMR Serverless Operator, I don't see an on_kill method, so assuming we just need to add that.

I'm not sure how to handle the EmrServerlessCreateApplicationOperator operator, though - if the workflow has a corresponding EmrServerlessDeleteApplicationOperator, we'd probably want to delete the application if the job is cancelled.

How to reproduce

Operating System

n/a

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.0.0

Deployment

Amazon (AWS) MWAA

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

phanikumv commented 1 year ago

Thank you for creating the issue @dacort , would you like to create a PR for the on_kill method implementation?

dacort commented 1 year ago

@phanikumv Yep, happy to! Just getting reacquainted with my airflow dev environment. 😁

phanikumv commented 1 year ago

Assigned it to you

eladkal commented 1 year ago

When marking an EMR Serverless job as failed, the job continues to run.

This is expected. Marking a task as failed in the UI just change the status of the task in the DB. It does not invoke on_kill function.

Converting to feature (add on_kill() function) as this is not a bug.

dacort commented 1 year ago

It does not invoke on_kill function.

I was wondering if that was the case. When does it get invoked?

potiuk commented 1 year ago

This is expected. Marking a task as failed in the UI just change the status of the task in the DB. It does not invoke on_kill function.

I was wondering if that was the case. When does it get invoked?

I think @eladkal you mistook it with something else (some callbacks or maybe race conditions ?). When task is running and you clear it from the UI, it will be marked as FAILED in the DB and when the task runs a heartbeat, as far as I know it checks if the state is set to "FAILED" and generally it will run the "on_kill" method when exiting. So I think indeed it is a bug.

phanikumv commented 1 year ago

yes we need to add on_kill method on the operator class to kill stale processes on target system. For example we did the same in Trino Operator when the task is cleared from UI and the query wasnt getting killed in Trino DB