gunicorn 19.x causing problems with django-1.7.1 and gevent

daltonmatos commented 10 years ago

Hello,

I'm seeing this error with gunicorn 19.x (tested with 19.0 and 19.1.1) using the gevent worker. I'm using django-1.7.1.

this is the exception:

Traceback:
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  87.                 response = middleware_method(request)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py" in process_request
  34.         if user and hasattr(user, 'get_session_auth_hash'):
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/utils/functional.py" in inner
  224.             self._setup()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/utils/functional.py" in _setup
  357.         self._wrapped = self._setupfunc()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py" in <lambda>
  23.         request.user = SimpleLazyObject(lambda: get_user(request))
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py" in get_user
  11.         request._cached_user = auth.get_user(request)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py" in get_user
  151.         user_id = request.session[SESSION_KEY]
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py" in __getitem__
  49.         return self._session[key]
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py" in _get_session
  175.                 self._session_cache = self.load()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/contrib/sessions/backends/db.py" in load
  21.                 expire_date__gt=timezone.now()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/manager.py" in manager_method
  92.                 return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in get
  351.         num = len(clone)
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in __len__
  122.         self._fetch_all()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in _fetch_all
  966.             self._result_cache = list(self.iterator())
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/query.py" in iterator
  265.         for row in compiler.results_iter():
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py" in results_iter
  700.         for rows in self.execute_sql(MULTI):
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py" in execute_sql
  784.         cursor = self.connection.cursor()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/backends/__init__.py" in cursor
  163.         self.validate_thread_sharing()
File "/home/sieve/sitroom.sieve.com.br/shared/env/local/lib/python2.7/site-packages/django/db/backends/__init__.py" in validate_thread_sharing
  515.                 % (self.alias, self._thread_ident, thread.get_ident()))

Exception Type: DatabaseError at /
Exception Value: DatabaseWrapper objects created in a thread can only be used in that same thread. The object with alias 'default' was created in thread id 140049505195808 and this is thread id 61130224.

Here is my gunicorn.conf.py:

##
# Gunicorn config at 
# Managed by Chef - Local Changes will be Nuked from Orbit (just to be sure)
##

# What ports/sockets to listen on, and what options for them.
bind = "127.0.0.1:9300"

# The maximum number of pending connections
backlog = 2048

# What the timeout for killing busy workers is, in seconds
timeout = 300

# How long to wait for requests on a Keep-Alive connection, in seconds
keepalive = 2

# The maxium number of requests a worker will process before restarting
max_requests = 1024

# Whether the app should be pre-loaded
preload_app = True

# How many worker processes
workers = 16

# Type of worker to use
worker_class = "gevent"

# The granularity of error log outputs.
loglevel = "error"

The error occurs every time, just accessing any URL of the django app (that depends on any model, of course) triggers this error. Any ideas? Since changing to gunicorn-18.0 solves the problem, I'm assuming that gunicorn-19.x is doing something different.

If you need any additional information, let me know and I will post it here.

For now I'm using 18.0.

Thanks a lot.

tilgovi commented 10 years ago

Just a question first: does taking out the preload option help? I think I remember another similar issue being reported. If changing that option helps, then we must have changed something about how Django applications are loaded and it will be easier for us to track it down.

collinanderson commented 10 years ago

I think we import the django wsgi application in the master process, which at least on django 1.7, initializes all of the django models on import.

daltonmatos commented 10 years ago

Hello @tilgovi, thank you for your time. I cant confirm that using preload_app = False I'm able to use gunicorn-19.1.1 and django-1.7.1.

Do you have any idea what gunicorn cloud be doing differently about this in versions 19.x?

I have now a second problem. I use Raven[1] in my django app. And accorging to the Raven docs, when using a WSGU middleware[1] it is possible to use Raven as a WSGI object (wrapping the original django WSGI app), and when using gunicorn (as an external command, not the run_gunicorn django command) we must add a hook to gunicorn[2]. When this hook is enabled, gunicorn only boots if preload_app = True, but then gevent does not work. When I use preload_app = True with the hook enabled, gunicorn does not even boot, giving me and ImportError saying it cannot import my "DJANGO_SETTINGS_MODULE".

Raven is 4.2.3

For now I will get back to gunicorn 18.x, since it works with both configurations (gevent and raven as a WSGI middleware) Should I open another Issue about this?

[1] http://raven.readthedocs.org/en/latest/config/django.html#wsgi-middleware [2] http://raven.readthedocs.org/en/latest/config/django.html#gunicorn

benoitc commented 9 years ago

@daltonmatos the way databases are setup in django 1.7.1 has probably changed are apparently not threadsafe. I am not sure what to do at that point.

genxstylez commented 9 years ago

One more thing, just found out today, session handling in django 1.7 with gunicorn 19.3.0 expires within seconds.

Had to switch back to 18 to make it work.

FYI.

fletom commented 9 years ago

This seems to be the same issue as https://github.com/benoitc/gunicorn/issues/879, which was closed but not fixed. It makes gunicorn 19.x. unusable with Django. Perhaps only for the gevent worker though? I haven't tested for other worker types.

benoitc commented 9 years ago

@tilgovi i wonder if it's not due to the change done in the environ. in 19.x we set wsgi.multithread to true where it was false in 18.x :

https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L99

This change may be handledd differently in django. And a greenlet is not really a thread. Thoughts? At least maybe we could try to revert it and check if it solves the issue. Do we have a test or a reproducible way for this issue?

tilgovi commented 9 years ago

Oh, I bet that is the issue. Good memory, @benoitc.

I still don't know why yet. As I said at the time to justify the change, I can't imagine that it would make anything break because it would mean that the framework would only take more precautions about sharing data. Evidently, I may be wrong.

benoitc commented 9 years ago

I pushed the branch fix/927 to check if the fix above works. Please test!

brockhaywood commented 9 years ago

I'm still able to reproduce this error using fix/927 with django 1.8.1 (and 1.7.1) and gevent 1.0.1

tilgovi commented 9 years ago

That's reassuring to me, even if we still don't know the cause.

benoitc commented 9 years ago

So I tried using the example application and didn't reproduce the issue running the following command line:

gunicorn --env DJANGO_SETTINGS_MODULE=testing.settings -k gevent -w3 wsgi:application

Can anyone provides me a reproducible example?

brockhaywood commented 9 years ago

I can still reproduce this issue invoking gunicorn like so:

    gunicorn --env DJANGO_SETTINGS_MODULE=settings -w 3 --max-requests=1000 -b 0.0.0.0:8000 -k gevent --config ogs/gunicorn.py  --pythonpath ogs wsgi:application

Is there any relevant details you need?

benoitc commented 9 years ago

@brockhaywood should be project.settings, but yes i need to have a way to reproduce it, do you have any minimal code to share.

Looking at the trace above, it could also be the monkey patching. Maybe threads shouldn't be monkey patched. Can you try the following patch?

--- a/gunicorn/workers/ggevent.py
+++ b/gunicorn/workers/ggevent.py
@@ -60,9 +60,9 @@ class GeventWorker(AsyncWorker):

         # if the new version is used make sure to patch subprocess
         if gevent.version_info[0] == 0:
-            monkey.patch_all()
+            monkey.patch_all(thread=False)
         else:
-            monkey.patch_all(subprocess=True)
+            monkey.patch_all(subprocess=True, thread=False)

         # monkey patch sendfile to make it none blocking
         patch_sendfile()

brockhaywood commented 9 years ago

Using the above patch with the following wsgi no longer produces the specified error:

import os

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")

from gevent import monkey
monkey.patch_all(thread=False)

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

from django.core.cache.backends.memcached import BaseMemcachedCache
BaseMemcachedCache.close = lambda self, **kwargs: None

I'm now seeing a different and new error but have no idea if it's related:

ProgrammingError: close cannot be used while an asynchronous query is underway

This is occurring, 1in 10/15 requests is doing a count on a model. I suspect this is an error in my application which is now uncovered.

benoitc commented 9 years ago

@brockhaywood cool! btw ahat's your version of gevent? For your latest error it dies not seem related to gunicorn indeed.

brockhaywood commented 9 years ago

Yah, good news! On version 1.0.1 of gevent.

benoitc commented 9 years ago

@tilgovi so I wonder if we shouldn't patch gunicorn removing the thread monkey patching. Maybe we should also set wsgi.multithread to False? Thoughts?

brockhaywood commented 9 years ago

@benoitc I'm able to reproduce the new error that I referenced above with a very simple Django application that has just one view and does a simple model query and count if I run multiple concurrent requests.

Do you have any suggestions as to which component is the most likely culprit? Is this likely a gevent problem or psycogreen?

benoitc commented 9 years ago

More likely psycogreen. Are you using such hook to setup it?

def def_post_fork(server, worker):
    from psycogreen.gevent import psyco_gevent
    psyco_gevent.make_psycopg_green()
    worker.log.info("Made Psycopg Green")

brockhaywood commented 9 years ago

Oh, I'm actually using

def post_fork(server, worker):    
    from psycogreen.gevent import patch_psycopg
    patch_psycopg()

I'll try your suggestion.

benoitc commented 9 years ago

Your code is more accurate if I read the code of psycogreen:

https://bitbucket.org/dvarrazzo/psycogreen/src/115d0627da1ac9ff48c0cb9287257cd35868cdf9/psycogreen/gevent.py?at=master#cl-17

brockhaywood commented 9 years ago

Ok, I'll try opening a ticket with psycogreen

benoitc commented 9 years ago

can you try to use the hook post_worker_init instead of post_fork?

brockhaywood commented 9 years ago

Still occurs for me when patching psycogreen in post_worker_init.

I've submitted a ticket to psycogreen as well: https://bitbucket.org/dvarrazzo/psycogreen/issue/2/databaseerror-used-with-asynchronous-query

benoitc commented 9 years ago

@brockhaywood OK :/ Also found this link: https://bitbucket.org/dvarrazzo/psycogreen-hg/issue/1/databaseerror-execute-used-with . I am not sure if it could help you though.

brockhaywood commented 9 years ago

I did take a look at that and it seems like the same problem but I didn't see a resolution, unfortunately.

benoitc commented 9 years ago

@brockhaywood can you share any simple example of code that could help me to reproduce the issue? I will have a look on it tomorrow.

brockhaywood commented 9 years ago

Here the simple project that I'm using to reproduce: https://github.com/brockhaywood/gunicorn_gevent_issue

benoitc commented 9 years ago

thanks!

On Sun, May 17, 2015 at 10:34 PM brockhaywood notifications@github.com wrote:

Here the simple project that I'm using to reproduce: https://github.com/brockhaywood/gunicorn_gevent_issue

— Reply to this email directly or view it on GitHub https://github.com/benoitc/gunicorn/issues/927#issuecomment-102852497.

brockhaywood commented 9 years ago

@benoitc any thoughts on this?

benoitc commented 9 years ago

@brockhaywood sorry i didn't have time to test it. Since I'm locked in a flight tomorrow, I will anyway :)

brockhaywood commented 9 years ago

Thanks On 26 May 2015 1:17 pm, "Benoit Chesneau" notifications@github.com wrote:

@brockhaywood https://github.com/brockhaywood sorry i didn't have time to test it. Since I'm locked in a flight tomorrow, I will anyway :)

— Reply to this email directly or view it on GitHub https://github.com/benoitc/gunicorn/issues/927#issuecomment-105652947.

ben-simmons commented 9 years ago

Was just experiencing the same DatabaseWrapper issue when I saw this thread.

I can confirm that removing the preload_app option from my gunicorn config file fixes the problem. I'm using:

Django==1.7.7
gevent==1.0.2
gunicorn==19.3.0
psycopg2==2.5.2
psycogreen==1.0
gunicorn wrapped by PgBouncer

ghost commented 9 years ago

(jumping in since we've been seeing the same issues as described in this thread)

It looks like these issues are being caused by the fix for issue #478 (don't monkey patch the master).

By not monkey patching the master, app preload fundamentally creates a mismatch if you're using gevent workers. Any app code which runs as part of the preload runs in an environment that is not monkey patched. Any app code which runs in the worker runs in a monkey patched environment.

For us, just restoring the setup method to the gevent worker class as it's defined in 18.x, so that the master is monkey patched and it is monkey patched prior to app preload, fixed all our issues. This includes the DatabaseWrapper bug, as well as another issue we were seeing where gevent workers would hang and timeout. No other changes were necessary (we do not have to disable monkey patching threads, for example).

tilgovi commented 9 years ago

I would still be very hesitant to monkey-patch the master process, but if you want to do it you may be able to use the server hooks instead of patching gunicorn itself. If you can do that, I would welcome any way we can document that somewhere that's better than here.

ghost commented 9 years ago

I think the short term plan for us is to create an (internal) gunicorn fork with this patch applied.

I'll investigate moving a monkey patch step into our wsgi code, but this still seems like something which should be done inside the gunicorn code (even as a configurable option). Patching in preloaded app code will affect the master one way or another, and the whole point would be to keep the monkey patched state of the workers in sync with the master. Applying the master monkey patch in app code means that app developers who depend on this behavior will be always playing catchup to whatever changes gunicorn makes to when/how it monkey patches the worker.

tilgovi commented 9 years ago

If it works to patch early, maybe that should be considered as a change.

tilgovi commented 9 years ago

I can't remember if that was ever the case and if that caused problems, but it's always tricky with monkey patching.

jamadden commented 9 years ago

Our own production gunicorn application requires preloading and so also requires monkey-patching in the master (we use Pyramid's/paste's pserve script to launch the app and load gunicorn from an ini file; we have a wrapper around it to be sure that monkey-patching happens ASAP).

I'm a current gevent maintainer. I'm happy to look at any issues in gevent required to make this scenario work better. We're expecting to drop 1.1 soon-ish and it would be great if this was a supported option.

ghost commented 9 years ago

I tried applying the same monkey patch logic at the top of our wsgi implementation, and we started seeing the DatabaseWrapper issue. Not entirely sure why though - looking at the wsgi invocation in arbiter.py, that happens a few lines after the point where GeventWorker.setup will be called (as a side effect of the worker_class assignment. If I'm reading the code right.

I'll followup if I have a eureka or get lucky with regards to the app code, but so far for us, patching earlier in the gunicorn master code is the only fix that solves all our issues.

tilgovi commented 9 years ago

@jamadden thanks so much.

@jzupko it's usually important that the patching happens before any imports that might be affected. That's why I suggested to do it in a server hook, like by putting a post_fork function in a gunicorn.conf.py file. However, I just realized that the application preloading happens before any of the server hooks and it's not clear where one would do that.

tilgovi commented 9 years ago

@benoitc do we have any mechanism for worker-specific config?

tilgovi commented 8 years ago

I'm adding this to the release 20 milestone and suggesting that we consider monkey patching the master, maybe, if it can be done without re-introducing problems like #478.

benoitc commented 8 years ago

@tilgovi you have the setup method of the worker that can be used for that I think.

fletom commented 7 years ago

Does anyone have a temporary fix for this until 20.0 is out? Or is the only solution to choose between disabling preload_app or downgrading to 18.0?

benoitc commented 7 years ago

@fletom temporary fix for? Did you try the latest master? Did it works for you?

fletom commented 7 years ago

@benoitc I mean a fix for the "DatabaseWrapper objects" error.

Current gunicorn master doesn't work at all for me actually. On regular 19.6.0 I get this (normal):

[2017-01-05 15:46:32 -0500] [3222] [INFO] Starting gunicorn 19.6.0
[2017-01-05 15:46:32 -0500] [3222] [INFO] Listening at: http://127.0.0.1:8000 (3222)
[2017-01-05 15:46:32 -0500] [3222] [INFO] Using worker: gevent
[2017-01-05 15:46:32 -0500] [3226] [INFO] Booting worker with pid: 3226
[2017-01-05 15:46:32 -0500] [3227] [INFO] Booting worker with pid: 3227
[2017-01-05 15:46:32 -0500] [3228] [INFO] Booting worker with pid: 3228
[2017-01-05 15:46:32 -0500] [3229] [INFO] Booting worker with pid: 3229

And on latest master with the 100% exact same configuration I get this, and nothing listening on any port as far as I can tell:

[2017-01-05 15:47:19 -0500] [3308] [INFO] Starting gunicorn 19.6.0
[2017-01-05 15:47:19 -0500] [3308] [INFO] Listening at:  (3308)
[2017-01-05 15:47:19 -0500] [3308] [INFO] Using worker: gevent
[2017-01-05 15:47:19 -0500] [3312] [INFO] Booting worker with pid: 3312
[2017-01-05 15:47:19 -0500] [3313] [INFO] Booting worker with pid: 3313
[2017-01-05 15:47:19 -0500] [3314] [INFO] Booting worker with pid: 3314
[2017-01-05 15:47:19 -0500] [3315] [INFO] Booting worker with pid: 3315

Seems bugged?

benoitc commented 7 years ago

@fletom how do you launch gunicorn?

fletom commented 7 years ago

@benoitc In this case, it's gunicorn -c gunicorn_conf.py <appname>.wsgi .

My gunicorn_conf.py is:

workers = 4

worker_class = 'gevent'

preload_app = False

benoitc / gunicorn

gunicorn 19.x causing problems with django-1.7.1 and gevent #927