apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.21k stars 13.76k forks source link

deferred tasks get kill during heartbeat callback in some rare cases #40435

Open 0x26res opened 2 days ago

0x26res commented 2 days ago

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.1 (MWAA)

What happened?

My airflow deferred (AWS) BatchOperator occasionally fail. When this happens I can see this:

{{local_task_job_runner.py:302}} WARNING - State of this instance has been externally set to deferred. Terminating instance.

I actually think it's an oversight in the code for local_task_job_runner. This line should check for state being RUNNING or DEFERRED.

What you think should happen instead?

No response

How to reproduce

This is hard to reproduce. The issue is very transient.

Operating System

MWAA

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.16.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

MWAA

Anything else?

I would say it happens one out of 100 run.

Are you willing to submit PR?

Code of Conduct