Closed tahajahangir closed 9 years ago
what do you mean by "When using threads (with sync worker class), worker accepts connection even when all threads is busy. " ? With the --thread
option?
yes, (actually --threads
option)
thanks for your answer. will work on it during the week. Expect a fix for the coming release this we.
Hrm, I am not sure how to reproduce that issue. When I look at the code of the threaded workers it appears, that the worker should crash if the number of active connections is greater than the number of accepted worker connections:
https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/gthread.py#L174-L178
When it happen and no futures have returned in the timeout the connections breaks. Are you sure the connections are not handled by another worker?
Steps to regenerate
foo.py
. It's a helloworld app which waits 100s if url is '?wait'import time
import os
def application(env, start):
t = time.time()
print('---- start in ', os.getpid())
if env['QUERY_STRING'] == 'wait':
while time.time() - t < 100: time.sleep(1)
print('--- end in', os.getpid())
start('200 OK', []);
return [b'Hello World']
gunicorn --error-logfile - --workers 2 --threads 2 hello
. The server has 2 workers, each with 2 threads.http://127.0.0.1:8000/?wait'
. Now log looks like:[2014-10-20 10:26:49 +0330] [5419] [INFO] Starting gunicorn 19.1.1
[2014-10-20 10:26:49 +0330] [5419] [INFO] Listening at: http://127.0.0.1:8000 (5419)
[2014-10-20 10:26:49 +0330] [5419] [INFO] Using worker: threads
[2014-10-20 10:26:49 +0330] [5422] [INFO] Booting worker with pid: 5422
[2014-10-20 10:26:49 +0330] [5423] [INFO] Booting worker with pid: 5423
---- start in 5423
---- start in 5423
---- start in 5422
3 of 4 threads are now busy, and one is idle.
http://127.0.0.1:8000/
(without cache, use Control+F5). Some of requests will succeed, but some of them won't generate any output (and stuck in "Waiting for 127.0.0.1" status) , even always there is a 1 idle thread.We're having the same problem, settings workers to 2+ threads causes exact same issue as explained above.
on which OS? kernel?
Description: Ubuntu 12.04.4 LTS Linux code 3.8.0-34-generic 49~precise1-Ubuntu SMP Wed Nov 13 18:05:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
I've run the example provided above and I think I understood the problem.
So the issue is that the threaded worker is not designed to accept long connections right now. It is mostly working like the sync worker. Using 2 threads / workers mean that only 4 simultaneous connections can be handled. Trying to wait like this will rapidly lock exhaust the number of threads. However since the accepting loop is async and handled in its own loop, you may have more connections accepted that will all wait for a free worker,
I need to think more about that design. I am thinking to remove any async accepting loop and handle the pool synchronously instead.
:+1: better to handle the pool synchronously. That is better for upstream load balancers because they will see a slow connect and might fail over.
Could it be that this error is also present even if you have 1 threaded worker ? Seems that we have a odd situation that even after commenting threads=2 in our config we're able to reproduce state when a deadlocked worker accepts connections.
EDIT:
The interesting thing is that the deadlock is during the requests initialization phase.
user - > nginx -> gunicorn(webapp) -> rpcserver the deadlock happens on connecting to rpcserver, and when that happens we have gunicorn accept connections and wait...
Listening socket is always open, and connecting clients will be queued in TCP backlog (even gunicorn does not accept a new connection). That's not related to this issue.
But there is another issue: After sending stop signal, while it waits for current requests to complete, gunicorn does not close listen socket.
@tahajahangir this actually by desig. But why would you want it? Though this is not related to this ticket, can you open a new ticket about it?
I have raised this issue before and tried to solve it. I forget why I failed. I could dig up the comments maybe. But we should stop listening as soon as we start shutting down.
:+1: opening a new issue for it. On Oct 25, 2014 12:53 AM, "Benoit Chesneau" notifications@github.com wrote:
@tahajahangir https://github.com/tahajahangir this actually by desig. But why would you want it? Though this is not related to this ticket, can you open a new ticket about it?
— Reply to this email directly or view it on GitHub https://github.com/benoitc/gunicorn/issues/908#issuecomment-60475056.
@tilgovi If I recall correctly, the reason we are not doing it is for the case where we d a USR2 and stop the old master. In that case we don't want to close socket so the master can still listen on it. I am actually wondering if the system doesn't take care of that for us.... (also if it's portable)
That was indeed the reason. I'll follow up on the other issue. On Oct 25, 2014 3:21 AM, "Benoit Chesneau" notifications@github.com wrote:
@tilgovi https://github.com/tilgovi If I recall correctly, the reason we are not doing it is for the case where we d a USR2 and stop the old master. In that case we don't want to close socket so the master can still listen on it. I am actually wondering if the system doesn't take care of that for us.... (also if it's portable)
— Reply to this email directly or view it on GitHub https://github.com/benoitc/gunicorn/issues/908#issuecomment-60478205.
When the max_requests option is in use, and there are pending requests queued up for the worker, but the worker is shutting down, the queued up requests are closed off forcely 5xx error.
i am slightly revisiting the gthreads worker. The new workflow is quite more simple:
I will have a patch available tomorrow.
please review #962 which should solve the issue.
Hi, we met a very similar-looking issue with the current version (20.1). Using gthread workers, gunicorn sometimes sent requests to workers already handling a request, despite other ones being available. The specificity of our case (which maybe generates the issue) is that those workers were handling a persistent connection (websocket). Thus they were occupied, but not necessarily "busy" handling data at the moment the request arrived. This happened with gthread workers, even if using only 1 thread.
Switching to gevent workers solved the issue for us. Anyway, thank you for the great tool! Sorry that I can't create a prototype reproducing the bug right now.
When using threads (with sync worker class), worker accepts connection even when all threads is busy. In our environment, sometimes (because a deadlock in cairocffi library) all threads of a worker are locked, but worker continues accepting connections until limit of open-files is reached.
I looked at the code in
gthread.py
and it regards theworker_connections
setting (despite what the documentation says, that it only affects Eventlet and Gevent workers).I think in threaded workers,
worker_connections
always should be set to number of threads.