etianen / aiohttp-wsgi

WSGI adapter for aiohttp.
https://aiohttp-wsgi.readthedocs.io
BSD 3-Clause "New" or "Revised" License
232 stars 20 forks source link

"socket.send() raised exception." #4

Closed jcea closed 8 years ago

jcea commented 8 years ago

I return an iterable for WSGI to stream to the client. The iterable can provide data for hours (radio streaming).

If the client disconnects before the iterable is done, I see this message printing on stderr: "socket.send() raised exception.". Worse, the iterable keeps iterating until is done, sucking CPU and RAM. Doing this several times I can bring down my server, just leaving broken connections behind me sucking resources.

Expected result: If a client disconects, the iterable I provided WSGI to send data to the client must be disposed, not called anymore and, eventually, garbage collected. If my iterables had a "close" method, call it before disposing the reference.

Code example showing this issue:

#!/usr/bin/env python3

import time
import aiohttp_wsgi

def application(environ, start_response) :
    def iterable() :
        while True :
            time.sleep(1)
            yield b'1234\r\n'

    start_response('200 OK', [
        ('Content-Length', '9999999999'),
        ('Content-Type', 'text/plain'),
    ])
    return iterable()

aiohttp_wsgi.serve(application, host="127.0.0.1", port=18333)

After making a request to port 18333 and force a disconnection, I will get a "socket.send() raised exception." message in the main window every second, showing that the iterator is still running instead of being disposed.

etianen commented 8 years ago

Thanks for the bug report, I'll check it out first thing next week!

Incidentally, this is not a good way to stream radio. Each listener will consume one thread while listening, meaning that your server will only be able to support 8 users! On Sun, 6 Dec 2015 at 02:43, jcea notifications@github.com wrote:

I return an iterable for WSGI to stream to the client. The iterable can provide data for hours (radio streaming).

If the client disconnects before the iterable is done, I see this message printing on stderr: "socket.send() raised exception.". Worse, the iterable keeps iterating until is done, sucking CPU and RAM. Doing this several times I can bring down my server, just leaving broken connections behind me sucking resources.

Expected result: If a client disconects, the iterable I provided WSGI to send data to the client must be disposed, not called anymore and, eventually, garbage collected. If my iterables had a "close" method, call it before disposing the reference.

Code example showing this issue:

!/usr/bin/env python3

import time import aiohttp_wsgi

def application(environ, start_response) : def iterable() : while True : time.sleep(1) yield b'1234\r\n'

start_response('200 OK', [
    ('Content-Length', '9999999999'),
    ('Content-Type', 'text/plain'),
])
return iterable()

aiohttp_wsgi.serve(application, host="127.0.0.1", port=18333)

After making a request to port 18333 and force a disconnection, I will get a "socket.send() raised exception." message in the main window every second, showing that the iterator is still running instead of being disposed.

— Reply to this email directly or view it on GitHub https://github.com/etianen/aiohttp-wsgi/issues/4.

jcea commented 8 years ago

The point of returning an iterable is to free the thread for another request. That is my understanding. If I an mistaken, do you have a suggestion?.

etianen commented 8 years ago

Iterables are there to avoid buffering a huge response in memory. They still have to be unwound by a thread, as calling next() on an iterable can block for an arbitrary amount of time.

You should use an aiohttp view and an asyncio coroutine for big streaming responses. Check the aiohttp docs for info. On Sun, 6 Dec 2015 at 12:01, jcea notifications@github.com wrote:

The point of returning an iterable is to free the thread for another request. That is my understanding. If I an mistaken, do you have a suggestion?.

— Reply to this email directly or view it on GitHub https://github.com/etianen/aiohttp-wsgi/issues/4#issuecomment-162309104.

jcea commented 8 years ago

If I understand this correctly, "next()" will be iterated in a thread, but I can have unlimited (client) iterables in flight. Nowaday my "next()" will return 4MB, so I expect each iterable to be idle for quite a while, freeing the threads to take care of the other iterables. Am I right?

That is, each "next" is quite costly (like 10 seconds), returning 4MB data, enough for a couple of minutes. That is, each iterable eats 10 second every 120 seconds. Am I correct thinking that a single thread would take care of 12 iterables (10 seconds every 120 seconds, each?). Moreover, those 10 seconds are mostly waiting for IO, so 5 threads could handle 60 clients with low CPU usage.

Am I right?

etianen commented 8 years ago

Nope, it's one thread per client, held until the response is complete. Response data is buffered in-memory as fast as it can be created, and sent to the client as fast as it can be transmitted.

This is necessary, as it's important that the same thread is used to send the data to the same client. Consider thread local data, for example. This means that each iterable would have to be pinned to a specific worker thread, rather than being given to the next available worker in a pool.

Now, imagine if you've got ten iterables, all pinned to a single worker thread, and taking it in turns to have their next() method called. Now imagine of the the next() methods takes five minutes to run. You've just blocked nine other responses!

Hence, the worker thread gets it's work done as fast as possible for a single iterable, then moves onto a new request. This is identical to how nginx acts as a reverse proxy to a gunicorn worker. Aiohttp is effectively acting as a reverse proxy to a wsgi app.

Wsgi is not designed to be used with streaming responses. Use aiohttp for that. The whole point of aiohttp-wsgi is that it allows sync frameworks (Django, flask), to be used alongside async functionality (websockets, streaming media). It's not a magic bullet for making a sync protocol like wsgi scale to hundreds of clients. On Sun, 6 Dec 2015 at 13:02, jcea notifications@github.com wrote:

If I understand this correctly, "next()" will be iterated in a thread, but I can have unlimited (client) iterables in flight. Nowaday my "next()" will return 4MB, so I expect each iterable to be idle for quite a while, freeing the threads to take care of the other iterables. Am I right?

That is, each "next" is quite costly (like 10 seconds), returning 4MB data, enough for a couple of minutes. That is, each iterable eats 10 second every 120 seconds. Am I correct thinking that a single thread would take care of 12 iterables (10 seconds every 120 seconds, each?). Moreover, those 10 seconds are mostly waiting for IO, so 5 threads could handle 60 clients with low CPU usage.

Am I right?

— Reply to this email directly or view it on GitHub https://github.com/etianen/aiohttp-wsgi/issues/4#issuecomment-162313830.

jcea commented 8 years ago

This thread is interesting. Too bad it is hidden in a bug tracker instead of a mailing list :).

Thanks for your comments.

etianen commented 8 years ago

I've added a fix to master that should raise an exception on wsgi_response.write() if the socket has broken.

On Sun, 6 Dec 2015 at 13:55 jcea notifications@github.com wrote:

This thread is interesting. Too bad it is hidden in a bug tracker instead of a mailing list :).

Thanks for your comments.

— Reply to this email directly or view it on GitHub https://github.com/etianen/aiohttp-wsgi/issues/4#issuecomment-162319559.