benoitc / gunicorn

gunicorn 'Green Unicorn' is a WSGI HTTP Server for UNIX, fast clients and sleepy applications.
http://www.gunicorn.org
Other
9.64k stars 1.74k forks source link

AssertionError: This event is already used by another greenlet #235

Closed desti closed 12 years ago

desti commented 12 years ago

i get the following error message if i use gevent as worker class. the message occurs under heavy load.

2011-07-25 23:32:45 [8864] [ERROR] Error handling request Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/gunicorn-0.12.2-py2.5.egg/gunicorn/workers/async.py", line 62, in handle_request respiter = self.wsgi(environ, resp.start_response) File "/srv/gcms_django/lib/django/contrib/staticfiles/handlers.py", line 68, in call return self.application(environ, start_response) File "/srv/gcms_django/lib/django/core/handlers/wsgi.py", line 273, in call response = self.get_response(request) File "/srv/gcms_django/lib/django/core/handlers/base.py", line 169, in get_response response = self.handle_uncaught_exception(request, resolver, sys.exc_info()) File "/srv/gcms_django/lib/django/core/handlers/base.py", line 218, in handle_uncaught_exception return callback(request, *_param_dict) File "/srv/gcms_django/gcms/cms/views.py", line 76, in error500 c = RequestContext(request,locals()) File "/srv/gcms_django/lib/django/template/context.py", line 177, in init self.update(processor(request)) File "/srv/gcms_django/gcms/cms/context_processors.py", line 26, in main_company request.MAIN_COMPANY = company = Company.objects.select_related(depth=1).get(id=settings.MAIN_COMPANY) File "/srv/gcms_django/lib/django/db/models/query.py", line 344, in get num = len(clone) File "/srv/gcms_django/lib/django/db/models/query.py", line 82, in len self._result_cache = list(self.iterator()) File "/srv/gcms_django/lib/django/db/models/query.py", line 273, in iterator for row in compiler.results_iter(): File "/srv/gcms_django/lib/django/db/models/sql/compiler.py", line 680, in results_iter for rows in self.execute_sql(MULTI): File "/srv/gcms_django/lib/johnny/cache.py", line 311, in newfun gen_key = self.keyhandler.get_generation(_tables, **{'db':db}) File "/srv/gcms_django/lib/johnny/cache.py", line 202, in get_generation return self.get_multi_generation(tables, db) File "/srv/gcms_django/lib/johnny/cache.py", line 220, in get_multi_generation generations.append(self.get_single_generation(table, db)) File "/srv/gcms_django/lib/johnny/cache.py", line 208, in get_single_generation val = self.cache_backend.get(key, None, db) File "/srv/gcms_django/lib/johnny/transaction.py", line 82, in get return self.cache_backend.get(key, default) File "/srv/gcms_django/lib/django/core/cache/backends/memcached.py", line 58, in get val = self._cache.get(key) File "build/bdist.linux-x86_64/egg/memcache.py", line 793, in get return self._get('get', key) File "build/bdist.linux-x86_64/egg/memcache.py", line 776, in _get rkey, flags, rlen, = self._expectvalue(server) File "build/bdist.linux-x86_64/egg/memcache.py", line 886, in _expectvalue line = server.readline() File "build/bdist.linux-x86_64/egg/memcache.py", line 1055, in readline data = recv(4096) File "build/bdist.linux-x86_64/egg/gevent/socket.py", line 427, in recv wait_read(sock.fileno(), timeout=self.timeout, event=self._read_event) File "build/bdist.linux-x86_64/egg/gevent/socket.py", line 165, in wait_read assert event.arg is None, 'This event is already used by another greenlet: %r' % (event.arg, ) AssertionError: This event is already used by another greenlet: (<Greenlet at 0x466f958: <bound method GGeventServer.handle of <GGeventServer at 0x26f12d0 fileno=10 address=10.0.0.140:8307>>(<socket at 0x4861390 fileno=14 sock=10.0.0.140:830, ('10.0.0.150', 55333))>, timeout('timed out',))

tilgovi commented 12 years ago

There are too many moving parts in that traceback for me to really understand, but it sounds to me like gcms_djangolib/johnny/cache might be making an unsafe assumption about concurrent access to its cache backend. Where does that code come from? Can it be modified to use a pool of connections to the cache backend or use a mutual exclusion lock around access it?

desti commented 12 years ago

johnny cache is a caching layer. http://packages.python.org/johnny-cache/ i think the problem is maybe the combination of memcache on another server and a faulty network switch. in the meantime i've updated the linux running on the server (was an ubuntu 8.04 machine) and replaced the switch. gunicorn is running stable in sync mode now. i'll retest async mode in the next days.

tilgovi commented 12 years ago

Thanks, @desti. Please re-open the issue if it happens again or open another issue if it's appropriate.

notedit commented 12 years ago

i came to the similar problem when i do a memcache set call with gevent.

tilgovi commented 12 years ago

@notedit: Can you give some more information? What memcache client do you use? Are you sure it's safe to use with gevent?

notedit commented 12 years ago

the detail traceback like this: cache_client.set(key,resp,cache_time + random.randint(0,300)) File "/home/chika/guokr/guokr/trunk/cms/libshare/memcache.py", line 515, in set return self._set("set", key, val, time, min_compress_len) File "/home/chika/guokr/guokr/trunk/cms/libshare/memcache.py", line 740, in _set return(server.expect("STORED") == "STORED") File "/home/chika/guokr/guokr/trunk/cms/libshare/memcache.py", line 1052, in expect line = self.readline() File "/home/chika/guokr/guokr/trunk/cms/libshare/memcache.py", line 1041, in readline data = recv(4096) File "build/bdist.linux-i686/egg/gevent/socket.py", line 427, in recv wait_read(sock.fileno(), timeout=self.timeout, event=self._read_event) File "build/bdist.linux-i686/egg/gevent/socket.py", line 165, in wait_read assert event.arg is None, 'This event is already used by another greenlet: %r' % (event.arg, ) AssertionError: This event is already used by another greenlet: (<Greenlet at 0x8c7ef6c: <bound method Worker.handle of

i am using the memcache client from https://github.com/samuel/python-memcached/blob/master/memcache.py

i am not sure it is safe to use with gevent. the error occurs under heavy load too.

tilgovi commented 12 years ago

Thank you for the report.

Looking at that client it appears to be 100% python and to subclass threading.local for the Client class. Since the gevent worker patches threading.local to support greenlets I don't see any reason this should not work with gunicorn+gevent. I will look into this further.

notedit commented 12 years ago

i have asked denis the question,he tell me this:

"""This error means that a particular gevent.socket.socket instance was used by more than one greenlet concurrently.

It's OK to send in one greenlet and read in another greenlet at the same time, but trying to do multiple concurrent recv() calls or multiple concurrent send() calls will cause this error. If you do need multiple greenlet to write to the same socket, consider protecting sendall() call with a Lock """

tilgovi commented 12 years ago

Yes, I know. Looking at the code for that module I don't yet see how it's possible for greenlets to share sockets. The Client class is greenlet local (from gevent.monkey patching threading.local) and it's the client that has a list of servers which have sockets.

benoitc commented 12 years ago

why not using pylibmc ?

tilgovi commented 12 years ago

I can't actually see any way this is possible. I assume cache_client at the top of your traceback is an instance of memcache.Client? If so, it should be greenlet local so I don't see any way for this to happen.

Can you show any more of your code so I can help you debug? How/where do you instantiate cache_client?

notedit commented 12 years ago

ok cache_client is a instance of memcache.Client

here is some code:

cache_client = memcache.Client(['127.0.0.1:11211'])
def cache_decorator(cache_time):                                                                                                                                                                                                                   
    def wrap(func):
        def cached_func(req, *args, **kwargs):
            #cache_client = memcache.Client(['127.0.0.1:11211'])   # if i instantiate the cache_client here  the error will not happen
            key = ...
            resp=func(req,*args,**kwargs)
            cache_client.set(key,resp,cache_time + random.randint(0,300))
            return resp
        return cached_func
    return wrap

the cache_decorator is a decorator, i cache the function's result.

if the cache_client is instantiated out of cache_decorator, the error will happen(on random). if i instantiate the cache_client in the cache_decorator, the error will not happen.

it is realy strange.

tilgovi commented 12 years ago

I suspect that threading.local is not properly being monkey patched by gevent and so the client's socket is being shared across greenlets. What version of gevent are you using?

notedit commented 12 years ago

i am using the gevent-0.13.6

tilgovi commented 12 years ago

I'm definitely baffled. I tested with gevent-0.13.6 and threading.local is successfully patched so I think memcache.Client can be safely shared. If you can print the value of memcache.Client.__mro__ we can check whether or not this is working properly.

notedit commented 12 years ago

the memcache.Client.__mro__ is
(<class 'libshare.memcache.Client'>, <type 'thread._local'>, <type 'object'>)

notedit commented 12 years ago

what i am confused of is that if i instantiate the cache_client in the cache_decorator, the error will not happen.

tilgovi commented 12 years ago

Instantiated inside the decorator there is never a chance for the object to be shared, so a new socket is opened every single call. For some reason the monkey patching is not working, because the mro should read (<class 'libshare.memcache.Client'>, <class 'gevent.local.local'>, <type 'object'>)) or something like that.

If you can investigate, maybe by adding some logging output (you can use logging.getLogger(gunicorn.error).info()) gunicorn.worker.ggevent#GeventWorker.setup() and the memcache library's from threading import local, you may start to track down when/where/why threading.local isn't being patched.

Thing's I'd look for include:

notedit commented 12 years ago

i got it. i import the memcache before i patch_all()

thanks tilgovi

jbranc commented 11 years ago

wait what was the fix here? seeing something similar with memcache and gevent with django

tilgovi commented 11 years ago

The fix was evidently to use gevent's monkey patching before importing the memcache module.

It only makes sense to me that this is necessary if you're using the preload option. I'm not sure if OP was. Are you? What error are you seeing exactly? The same one?

hatcherfang commented 6 years ago

@tilgovi I got the same issue. This is my error traceback:

Traceback 
"/usr/lib/python2.7/site-packages/memcache.py", line 700,
 in set return self._set("set", key, val, time, min_compress_len, noreply) 
File "/usr/lib/python2.7/site-packages/memcache.py", line 1020, in _set return _unsafe_set() 
File "/usr/lib/python2.7/site-packages/memcache.py", line 1008, in _unsafe_set server.send_cmd(fullcmd) 

File "/usr/lib/python2.7/site-packages/memcache.py", line 1395, in send_cmd 
self.socket.sendall(cmd + b'\r\n') File "/usr/lib64/python2.7/site-packages/gevent/socket.py", line 464, 
in sendall data_sent += self.send(_get_memory(data, data_sent), flags, timeout=timeleft)  
File "/usr/lib64/python2.7/site-packages/gevent/socket.py", line 441, in send self._wait(self._write_event)   
File "/usr/lib64/python2.7/site-packages/gevent/socket.py", line 292, 
in _wait assert watcher.callback is   None,
 'This socket is already used by another greenlet: %r' % (watcher.callback, ) 
AssertionError: 
This socket is already used by another greenlet: <bound method Waiter.switch of <gevent.hub.Waiter object at 0xcdd89b0>>

Do you mean this is the Gevent monkey patch that I have not added after I import Memcached?

tilgovi commented 6 years ago

@hatcherfang gevent needs to apply its patch before memcache is imported.

hatcherfang commented 6 years ago

@tilgovi Your mean the code is just like this

import gevent
from gevent import monkey
monkey.path_all()
import pylibmc

. . .

But I did not use gevent even not import it and the traceback error happened, it's confusing for me. By the way, I use uwsgi to establish the server and this is my uwsgi.ini config file:


socket = 127.0.0.1:8001
chdir = /usr/myuwsgi/src
processes = 6
gevent = 1024
listen = 1024
module = webService
master = true
max-requests = 10240
pidfile = /var/myuwsgi/run/uwsgi.pid```

Did the `gevent = 1024`  lead to traceback error?
tilgovi commented 6 years ago

I'm sorry, @hatcherfang, I do not know what the gevent config for uwsgi does as far as calling monkey functions. Perhaps you should ask on the uswgi repo or forums.

benoitc commented 6 years ago

@hatcherfang how do you launch gunicorn? Where do you init memcache? Also I believe that the client you're using is blocking the gevent loop (use pylibmc). If you have any simple way to reproduce the issue on our own that would help.

Other than that simply using gunicorn + nginx proxy should work.

hatcherfang commented 6 years ago

@tilgovi Never mind, thank you a lot. @benoitc I used supervisord to init memcached.
The command is :

[program:memcached]
command=/usr/bin/memcached -p 11211 -m 512 -c 1024 -P /var/run/recv-memcached.pid
autostart=true
startsecs=5
redirect_stderr=true
stdout_logfile=/var/log/recv-memcached.log

The memcached is a detached process.

[root@logreceiver-uw2-30-1021-b-10-202-241-93 ~]# ps -ef | grep memcache
user     997   986  0 Nov02 ?        00:00:00 /usr/bin/memcached -p 11211 -m 512 -c 1024 -P /var/user/run/xxx-memcached.pid

You are right. I should first reproduce the issue or it is difficult for me to solve it.

jamadden commented 6 years ago

According to the traceback:

File "/usr/lib64/python2.7/site-packages/gevent/socket.py", line 292, 
in _wait assert watcher.callback is   None,
 'This socket is already used by another greenlet: %r' % (watcher.callback, ) 

This is gevent 1.0.1 or older. I would suggest upgrading; 1.2.2 is the current version.

hatcherfang commented 6 years ago

@jamadden this is our product environment and it is not easy to upgrade version. Thank you for your advice.