Closed awesomescot closed 2 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
I think the standalone DAG processor should be the same like integrated.
Can you please check: Is it running stable when DAG processing is integrated and not separated? Is the failure happening on first run already, so never completing? Or is it a in-stability that sometimes hot your environment? Can you bi-sect and in increments cut the amount of DAG files by half? Is there a specific expensive DAG file which takes long to parse? Or can you create an artificial file set which makes it re-producible?
Thanks for the reply. I have been testing it on the scheduler and it seems to also be struggling, so this should probably be a discussion and not an issue. I'll raise something over there.
Just in case anyone else stumbles on this - we hit a very similar issue, and worked around it by wrapping the dag processor command with timeout
. So in our case, where we have a dag-processor per subdir:
timeout --kill-after=10 600 airflow dag-processor --subdir $AIRFLOW_HOME/dags/$SUB_FOLDER -n 1
(And then surrounding code/infrastructure will ensure that another dag-processor is spun up)
Apache Airflow version
2.10.0
If "Other Airflow 2 version" selected, which one?
No response
What happened?
When I have over about 1000 dag files the standalone processor seems to stop functioning properly. I see CPU drop to almost zero. Parsing processes is also around 0. The dagbag never fills up. Logs are unhelpful. I can't seem to figure out what the dag processor is doing, seems as though it's silently crashing.
What you think should happen instead?
I think the standalone dag processor should process in the same or less time than the scheduler dag processor.
How to reproduce
I'm not sure I can share our dag files, but I will post my values file and would love to see if others can reproduce.
Operating System
kubernetes helm chart
Versions of Apache Airflow Providers
The ones in the helm chart.
Deployment
Official Apache Airflow Helm Chart
Deployment details
We are connecting to an RDS postgres instance(also very low cpu usage).
Anything else?
I've been trying to play around with settings to see if I can figure out what is happening, but no luck so far. I'm happy to post any logs that would be helpful.
Are you willing to submit PR?
Code of Conduct