Adds optional support for "adopting orphaned task instances" within the Batch and Fargate executors. This means that instead of terminating batch jobs / fargate tasks when the scheduler / executor are shutting down, they instead leave the tasks running. When a new scheduler / executor boots up, it will try to "adopt" the orphaned tasks by using the external_executor_id of the orphan task instances to resume synchronising respective task statuses from Batch / Fargate.
In order to support adoption of orphaned tasks, the BatchExecutor just needs to store the AWS Batch job_id in the TaskInstance.external_executor_id field when it submits a job, and then implement the BaseExecutor.try_adopt_task_instances method. This method simply needs to put the orphaned task instance key and external_executor_id attributes in the active_workers.add_job method of the newly booted executor.
The Fargate executor can support task adoption with the exact same flow, by storing the Fargate task_arn field in the external_executor_id. The Fargate executor needs to make a call to describe_tasks() in the try_adopt_task_instances method (using the orphaned task arns), in order to get the full Fargate task attributes required in its active_workers collection.
Addresses this issue: https://github.com/aelzeiny/airflow-aws-executors/issues/14
Adds optional support for "adopting orphaned task instances" within the Batch and Fargate executors. This means that instead of terminating batch jobs / fargate tasks when the scheduler / executor are shutting down, they instead leave the tasks running. When a new scheduler / executor boots up, it will try to "adopt" the orphaned tasks by using the
external_executor_id
of the orphan task instances to resume synchronising respective task statuses from Batch / Fargate.Airflow documentation on this behaviour is limited, though there is some basic context in the scheduler "tunables" doc: https://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#scheduler-tuneables
This feature is disabled by default, but can be enabled by setting the following conf option in either executor:
or by env var:
In order to support adoption of orphaned tasks, the BatchExecutor just needs to store the AWS Batch
job_id
in theTaskInstance.external_executor_id
field when it submits a job, and then implement theBaseExecutor.try_adopt_task_instances
method. This method simply needs to put the orphaned task instance key and external_executor_id attributes in theactive_workers.add_job
method of the newly booted executor.The Fargate executor can support task adoption with the exact same flow, by storing the Fargate
task_arn
field in the external_executor_id. The Fargate executor needs to make a call todescribe_tasks()
in thetry_adopt_task_instances
method (using the orphaned task arns), in order to get the full Fargate task attributes required in itsactive_workers
collection.