fanout / django-eventstream

Server-Sent Events for Django
MIT License
650 stars 85 forks source link

Too many open files #33

Closed maxchannels closed 5 years ago

maxchannels commented 5 years ago

Hi,

We are using django-eventstream for sending out events to clients. You can think of our workflow to be celery like use case but a very simple one. Things were working flawlessly until we hit the 'too many open files' error (Redhat 7.4). We tracked which processes are opening the files using 'lsof' and found python was shooting several threads which loaded the required libraries (mostly .so files). We are using gunicorn as our server which spawns uvicorn workers. Tried to fall back to 'runserver', but faced the same issue.

On trying out the 'time' and 'chat' examples, we saw the same behavior. On every refresh of the page (same machine, same browser, same tab) a new thread is spawned and 'lsof' lists an increment of about 2k files on every refresh of the page.

We tried to recreate the same issue on two other different machines with the same OS. Saw the same behavior, expect in 1 machine. This was a laptop with 4GB of RAM and the rest are servers with 256GB of RAM. Interestingly everything works absolutely fine in the laptop, but not in the servers. Maybe because of the relative sparsity of resources, OS is closing the files in the laptop but not in servers, which is causing the 'too many open files' error?

Any idea how to resolve this issue? Cheers!

maxchannels commented 5 years ago

I dug around a bit more and found out that for every request, ie, whenever I refresh the page, channels is creating a new session which then calls asgiref which calls asyncio to create a new thread and this new thread tries to open the required files which if done in large numbers, causes the 'too many open files' error.

The threads do not release the resources (the files) when I close the browser (firefox).

maxchannels commented 5 years ago

Going ahead with the threads assumption, tried to limit number of threads by setting ASGI_THREADS . The number of threads are now limited and thus the number of files too. I don't know what will happen if users > ASGI_THREADS will try to connect to the server. I guess now I need to read up on load balancing..

maxchannels commented 5 years ago

To explain, if I understood it correctly, why this was working on the laptop is, its a 4 core machine while the servers are 60 core machines. By default the underlying layer will try to allocate a pool of NUM_CPU * 5 threads, which was quite less for the laptop.

jkarneges commented 5 years ago

Interesting! I'm glad you found a way to get it working. And thanks for pointing out how Python 3 futures pool allocations work, in case someone else runs into this.