apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.27k stars 14.08k forks source link

Tasks taking too long time after 2.7.0 Airflow update #33688

Closed potiuk closed 1 year ago

potiuk commented 1 year ago

Discussed in https://github.com/apache/airflow/discussions/33664

Originally posted by **jaetma** August 23, 2023 Hi community! We have been using Airflow quite a long time, and right after updating from version 2.6.3 to 2.7.0, the running time increased extremely high. Tasks that used to take 15 seconds to complete now are taking 10 minutes! This is problematic because there are more tasks being queued than those that are finished. We've detected this issue in 3 projects running with Airflow, across 2 instances in Kubernetes and 1 with Docker. Illustrating image: ![image](https://github.com/apache/airflow/assets/123120332/1459f478-f2ca-42ed-b956-5fd52af52a8a)
BenoCharlo commented 7 months ago

@pankajkoti yes, I have a local setup with mwaa-local-runner. My previous version was AF 2.2.2 (a bit outdated). I do experience this issue with the local setup as well

pankajkoti commented 7 months ago

Thanks @BenoCharlo . Since you've a local setup, would be able to help more with some more testing. Maybe instead of directly upgrading from 2.2.2 to 2.7.2, can you try upgrading to 2.6.3 also and check if you observe such a slowness with 2.6.3 too?

Previously, it was observed that the perf decreased between 2.6.3 and 2.7. That experiment if you could carry would help into diagnosing this better.

Also, would you be able to share some insights on what your DAGs are doing. Or if you could share a small reproducible DAG that causes this perf degrade would be nice too.

BenoCharlo commented 7 months ago

@pankajkoti I've tried the 2.6.3 version today. A lot of dags are starting to run as expected. It has gotten a bit better but I am still experiencing the same delay problem for some dags. I have some tasks that delay 1 out 2 runs. The delays are actually the task never ending. I am suspecting writing to s3 bucket using pandas causing this issue.

Thanks for the workaround.