Airflow webserver is slow

apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

https://airflow.apache.org/

Apache License 2.0

37.12k stars 14.3k forks source link

Airflow webserver is slow #531

Closed mtustin-handy closed 9 years ago

mtustin-handy commented 9 years ago

This is probably something weird with my environment setup, but I'm opening this in case anyone has any ideas as to what might cause this.

Recently (like, since yesterday afternoon) the airflow webserver has been very, very slow. This persists no matter how many times I restart. It's particularly slow when visited via an AWS load balancer, moderately slow over an SSH port forward, and also very slow via local curl.

System is not under particularly high load, is able to contact the database server, the database server is responding promptly.

Running on redhat 7.

I'd add more details if I knew what was germane.

pascalknapen commented 9 years ago

Hello,

I have a similar experience after upgrading to 1.5.2. I downgraded back to 1.5.1 for the moment to fix the issue. My guess would be it is somehow related to one of the following changes:

MySQL uses mysqlclient lib instead of mysql-python
Using gunicorn instead of tornado as the wsgi web server

mtustin-handy commented 9 years ago

@pascalknapen That's awesome. I didn't realise either of those changes happened. Do you have any plans to try out reverting either of those changes?

artwr commented 9 years ago

mysqlclient is a fork of mysql-python so I would not necessarily expect a huge regression there. Quick question, are you running a single webserver? (We run several of them behind a load balancer which means that regressions for a single server may be harder for us to detect, and 4 workers (= processes) as per the default configuration). Note that the debug server runs on a single thread, and so might be slower as well. Usually one recommends 2N+1 gunicorn workers for N vCPUs. I have a PR out that exposes the ability to use multithreaded workers. Maybe playing with those options might help?

mtustin-handy commented 9 years ago

@artwr I just run the server with airflow webserver. This preforks 4 workers and one master process. We do use a single server on a single host. It's very slow even with only one person accessing it. It's much faster when accessed directly over an SSH tunnel than when accessed over an AWS load balancer.

artwr commented 9 years ago

@mtustin-handy May I ask why you would use a load balancer if it runs on a single node? It seems that this setup would just increase latency, significantly so if the load balancer ends up being in a different availability zone? If you have chrome, can you take a look at the developer tools on the Network tab when you load the server URL, and post a screenshot of what you see there?

mtustin-handy commented 9 years ago

@artwr Like Amazon itself, we use loadbalancers to make webservers available across network security/availability zones.

pascalknapen commented 9 years ago

@mtustin-handy @artwr This seems indeed to be related to the interplay between gunicorn and ELB. This article explains it a bit: https://forums.aws.amazon.com/thread.jspa?messageID=419138. As the reply suggests, using async workers might fix the problem. I'll try to find out tomorrow.

artwr commented 9 years ago

Another short term solution might be to bump the number of workers erronously named threads in our current config.

pascalknapen commented 9 years ago

Made the changes to support both sync and async workers in gunicorn. Just finished testing it and working great now behind ELB. Will send a pull request later today.

On Tue, Nov 10, 2015 at 10:51 PM, Arthur Wiedmer notifications@github.com wrote:

Another short term solution might be to bump the number of workers erronously named threads in our current config.

— Reply to this email directly or view it on GitHub https://github.com/airbnb/airflow/issues/531#issuecomment-155578019.

mtustin-handy commented 9 years ago

That is so awesome. I've been testing out the new scheduling stuff that @mistercrunch merged yesterday.

In the news https://www.handy.com/press: The Hottest Startups of 2014 (Forbes http://www.forbes.com/sites/briansolomon/2014/12/17/the-hottest-startups-of-2014/ ) Handy Hits $1 Million A Week In Bookings (TechCrunch http://techcrunch.com/2014/10/14/handy-hits-1-million-a-week-in-bookings-as-cleaning-economy-consolidates/ ) Handy Raises $50 Million in Series C (Handy Blog http://blog.handy.com/50m/) http://blog.handy.com/50m/

artwr commented 9 years ago

@pascalknapen : Does your PR look anything like https://github.com/airbnb/airflow/pull/610 ?

artwr commented 9 years ago

Closed by #618. Feel free to reopen if needed.