RetailMeNotSandbox / dart

Self-service data workflow management
MIT License
17 stars 12 forks source link

Dart actions get stuck in 'PENDING' state if task container isn't created #140

Closed maybeiambatman closed 7 years ago

maybeiambatman commented 7 years ago

When dart actions fail to create containers on AWS Batch with a CannotCreateContainer Error, the action stays in 'PENDING' state since the action never gets checked out. Even though these tasks never actually ran, the action is udpated with the batch_job_id before it's even created. Because the engine worker only queues actions that are 'PENDING' and don't have a batch_job_id, these actions get stuck in a 'PENDING' state where they do have a batch_job_id.

We need to transition these actions to 'FAILED' state to reflect them in the database accurately.

ophiradi commented 7 years ago

This problem exist with ECS too. It just surfaced with Batch.