aelzeiny / airflow-aws-executors

Airflow Executor for both AWS ECS & AWS Fargate
MIT License
53 stars 9 forks source link

Batch Executor doesn't mark the state of current task instance to running/success in airflow db #16

Closed Deepkawa closed 2 years ago

Deepkawa commented 3 years ago

Was trying this library to run airflow task on AWS Batch having dependencies

When the 1st dependency task executed successfully, the executor sends an event to scheduler to mark this task as complete but fails with the message "Executor reports task instance {} finished ({}) although the task says its {}. Was the task killed externally?". After a lot of debugging, figured it out that the executor is not marking the taskinstance to running state in the DB and when executor reports it as success, there is a mismatch hence failure

The batch was executed successful on AWS and returned success to the Boto API

Can you help to figure out something how to make this work or any workaround

aelzeiny commented 2 years ago

It's not the first time I've seen this. It is almost certainly an AMI permission regarding your Airflow Scheduler. You could check your scheduler logs. What's happening is that Airflow starts the scheduler, and the scheduler picks up a pending task. So the executor launches an AWS Batch Job, gets an AMI error, tries again, gets an AMI error, tries a third time, gets an AMI error, and then the task itself is marked as failed. The log for that task says that it was killed externally, which is true - the job died 3 times.

However, I didn't want to give you such a hand-wavy answer. So I wrote a little helper unittest file that will check your AMI permissions. Run this on your scheduler and report back if you still have an unclear issue.