apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.45k stars 14.11k forks source link

AWS ECS Logging very slow when lots of logging leading to task failure #42442

Open smsm1-ito opened 20 hours ago

smsm1-ito commented 20 hours ago

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.28.0 is affected

apache-airflow-providers-amazon==8.27.0 is not affected

Apache Airflow version

2.10.2

Operating System

Debian 12 bookworm

Deployment

Other Docker-based deployment

Deployment details

No response

What happened

After upgrading to Airflow 2.10.2 longer running ECS tasks with significant logging started failing. The logs would still be slowly appearing on Airflow, yet the ECS Task had completed. If the logging took more than an hour more than the task, then the ECS task in Airflow would fail with an error that the ECS Task was missing. This is due to the older tasks disappearing within ECS (Fargate).

Looking at the changes I came across https://github.com/apache/airflow/pull/41515/files which added a 0.1 second sleep if the timestamps were the same. On looking further at the logs of the tasks that were failing, there were 2 log times. One which was getting significantly later than the other from the application.

On rolling back the amazon provider to the previous version and still using Airflow 2.10.2 the issue went away.

Linked tickets #41515 #40875

What you think should happen instead

Logging should be submitted in a timely manner.

Could we go for a much shorter delay such as 0.001 seconds?

How to reproduce

Have an ECS Task that has a lot more logging than the time it takes to run the task.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 20 hours ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

smsm1-ito commented 16 hours ago

I've created a merge request for this: https://github.com/apache/airflow/pull/42449