etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.2k stars 129 forks source link

Constant `SearchContextError` on updating items #92

Closed jdotjdot closed 9 years ago

jdotjdot commented 9 years ago

I'm getting some odd errors in production from the Watson SearchContextMiddleware with Django 1.6. I just switched from using guincorn to uwsgi and gevent, though I'm not sure why that would make a difference.

This is the entirety of the traceback:

/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py in get_response
                response = middleware_method(request, response)
/app/.heroku/python/lib/python2.7/site-packages/watson/middleware.py in process_response
        self._close_search_context(request)
/app/.heroku/python/lib/python2.7/site-packages/watson/middleware.py in process_response
        self._close_search_context(request)
/app/.heroku/python/lib/python2.7/site-packages/watson/middleware.py in _close_search_context
            search_context_manager.end()
/app/.heroku/python/lib/python2.7/site-packages/watson/registration.py in end
        self._assert_active()

Here are the details I'm getting from the Django error email:

Key Value
Exception Type: SearchContextError
Exception Value: The search context is not active.

I believe (though I'm not 100% sure) that I'm only seeing these on updating models, rather than creating.

etianen commented 9 years ago

That is really odd. It implies that uwsgi is somehow not properly closing the request, or something.

What version of django-watson are you using?

On 27 Jan 2015, at 00:36, J.J. notifications@github.com wrote:

I'm getting some odd errors in production from the Watson SearchContextMiddleware with Django 1.6. I just switched from using guincorn to uwsgi and gevent, though I'm not sure why that would make a difference.

This is the entirety of the traceback:

/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py in get_response response = middleware_method(request, response) /app/.heroku/python/lib/python2.7/site-packages/watson/middleware.py in process_response self._close_search_context(request) /app/.heroku/python/lib/python2.7/site-packages/watson/middleware.py in process_response self._close_search_context(request) /app/.heroku/python/lib/python2.7/site-packages/watson/middleware.py in _close_search_context search_context_manager.end() /app/.heroku/python/lib/python2.7/site-packages/watson/registration.py in end self._assert_active()

Here are the details I'm getting from the Django error email:

Key Value Exception Type: SearchContextError Exception Value: The search context is not active. I believe (though I'm not 100% sure) that I'm only seeing these on updating models, rather than creating.

— Reply to this email directly or view it on GitHub.

jdotjdot commented 9 years ago

1.1.5

etianen commented 9 years ago

Okay, that’s the latest release.

Theoretically, every call to process_request should be balanced by a call to process_response or process_exception in the SearchContextMiddleware. Either this isn’t happening, or the search context is being messed up elsewhere.

You mentioned that you were using gunicorn async workers. Are you also monkey-patching the stdlib, particularly the threading part of the stdlib? If not, then the green threads in gunicorn will be messing up synchronisation on the thread-local data that watson uses.

On 27 Jan 2015, at 19:17, J.J. notifications@github.com wrote:

1.1.5

— Reply to this email directly or view it on GitHub.

jdotjdot commented 9 years ago

Well, here's the thing—it works fine with gunicorn, and I'm not doing any monkey patching. It's with gevent (which I believe does monkey-patch) where there's the issue.

etianen commented 9 years ago

I mean this:

http://www.gevent.org/gevent.monkey.html

It looks like the gunicorn async worker does monkey patch everything, however.

Are you using the preload_app setting in gunicorn? That might mess up the monkey patching.

Incidentally, I’d not recommend using async workers with Django unless your app is really set up for it. Django will open a database connection for each green thread. So, if your app is processing 1000 simultaneous connections, your database will melt.

More simply put, gunicorn async workers allow your webserver to handle thousands of concurrent requests. Most databases, however, cannot handle thousands of concurrent requests, so the bottleneck is just shifted elsewhere. A multithreaded/multiprocess model for Django works really well, so long as a buffering reverse proxy is placed in front of it to handle network IO. It limited the number of concurrent workers to a sane level, while still allowing thousands of client connections to be served in a timely manner.

On 29 Jan 2015, at 09:17, J.J. notifications@github.com wrote:

Well, here's the thing—it works fine with gunicorn, and I'm not doing any monkey patching. It's with gevent (which I believe does monkey-patch) where there's the issue.

— Reply to this email directly or view it on GitHub.

etianen commented 9 years ago

Aha, you said you’re using uwsgi with gevent. I don’t know if uwsgi does call monkey.patch_all()!

Try putting a call to gevent.monkey.patch_all() at the top of your wsgi file.

(I’d still think carefully about using gevent with Django, however)

On 29 Jan 2015, at 11:52, Dave Hall dave@etianen.com wrote:

I mean this:

http://www.gevent.org/gevent.monkey.html

It looks like the gunicorn async worker does monkey patch everything, however.

Are you using the preload_app setting in gunicorn? That might mess up the monkey patching.

Incidentally, I’d not recommend using async workers with Django unless your app is really set up for it. Django will open a database connection for each green thread. So, if your app is processing 1000 simultaneous connections, your database will melt.

More simply put, gunicorn async workers allow your webserver to handle thousands of concurrent requests. Most databases, however, cannot handle thousands of concurrent requests, so the bottleneck is just shifted elsewhere. A multithreaded/multiprocess model for Django works really well, so long as a buffering reverse proxy is placed in front of it to handle network IO. It limited the number of concurrent workers to a sane level, while still allowing thousands of client connections to be served in a timely manner.

On 29 Jan 2015, at 09:17, J.J. notifications@github.com wrote:

Well, here's the thing—it works fine with gunicorn, and I'm not doing any monkey patching. It's with gevent (which I believe does monkey-patch) where there's the issue.

— Reply to this email directly or view it on GitHub.

mtford90 commented 9 years ago

@etianen - hey Dave, have been working with JJ on this.

As you say - I imagine the lack of monkey patching is causing the issue here - will likely try this out and get back to you.

You make interesting points on the use of async with Django. Oddly enough was just reading your blog post on using gunicorn with Django on Heroku ;)

So at the moment we are using gunicorn on heroku (contradictory to your advice in the above post) and have been looking into uwsgi/gevent as an alternative having experienced various issues around high memory usage and timeout.

Coming from Node using async seemed appropriate given that the app is heavy on I/O - lots of http requests & database queries - seems daft stopping and starting all these threads. I did not think about whether or not passing the management of concurrent requests onto the database was a good idea - and probably should have - however with most of these operations being reads I would imagine that Postgres can handle this (at least at our current scale...). Also with lack of reverse proxy buffered I/O we're likely already passing that burden onto postgres with the current gunicorn setup - just with threads instead of greenlets.

Anyway - load testing with uwsgi/gevent instead of threaded Gunicorn tripled throughput and halved memory and response time at our current peak requests/min. As you say - this could end up as a problem once the site has scaled enough to pass the burden to Postgres,

Do you still recommend waitress for serving python on heroku? Noticed your post is just over a year ago.

Cheers!

etianen commented 9 years ago

Hi! :D

I’d still recommend using waitress for Heroku. It’s a really under-used and very competent web server. (I’m actually working on my own reverse-proxying alternative using Python 3’s new asyncio libraries, but that’ll be strictly Python 3 only. Watch this space!)

Anyway, with regard to overwhelming postgres, this is actually a real and massive problem on Heroku. Look at the connection limit on your database plan. The cheaper plans, it’s only 5 or 20. This means that, if that many or more people connect to your site simultaneously, anyone else will start getting server errors as the database connections are refused.

This is actually quite a constraint. If your database plan only allows 5 connections, you can guarantee to never max it out by setting the max threads for waitress to 4 (saving one for admin usage). Likewise with 20 connections, just set max threads to 19 or so. More than 10 threads is a bit silly though. Just spin up another dyno until your you dyne count x thread count is slightly less than your database connection count.

Better still, because waitress buffers the request and response, each worker should only hold a database connection open for a short time. Compare this to async workers, where a typical pattern is connect to database, read post data (SLOW), process request logic, disconnect. Database connections can be held open for 20 seconds or so, and when you’ve only got 5 connections total, that’s really bad!

So this isn’t just a theoretical concern. It can bite you very fast under very low load. It means that users start seeing server errors as your connections are refused.

Even better, when using waitress, you guarantee only a max of N connections open at a given time, so you can use Django persistent database connections to further improve performance! Using persistent connections with green threads can lead to database connection starvation even faster.

On 29 Jan 2015, at 12:33, Michael Ford notifications@github.com wrote:

@etianen - hey Dave, have been working with JJ on this.

As you say - I imagine the lack of monkey patching is causing the issue here - will likely try this out and get back to you.

You make interesting points on the use of async with Django. Oddly enough was just reading your blog post on using gunicorn with Django on Heroku ;)

So at the moment we are using gunicorn on heroku (contradictory to your advice in the above post) and have been looking into uwsgi/gevent as an alternative having experienced various issues around high memory usage and timeout.

Coming from Node using async seemed appropriate given that the app is heavy on I/O - lots of http requests & database queries - seems daft stopping and starting all these threads. I did not think about whether or not passing the management of concurrent requests onto the database was a good idea - and probably should have - however with most of these operations being reads I would imagine that Postgres can handle this (at least at our current scale...). Also with lack of reverse proxy buffered I/O we're likely already passing that burden onto postgres with the current gunicorn setup - just with threads instead of greenlets.

Anyway - load testing with uwsgi/gevent instead of threaded Gunicorn tripled throughput and halved memory and response time at our current peak requests/min. As you say - this could end up as a problem once the site has scaled enough to pass the burden to Postgres,

Do you still recommend waitress for serving python on heroku? Noticed your post is just over a year ago.

Cheers!

— Reply to this email directly or view it on GitHub.

mtford90 commented 9 years ago

On the alternative you're working on - very cool, do keep me informed on that! Is there a repo or something that I can watch?

On postgres etc, you’re right… that’s worrying. Unfortunately Heroku only provides current stats and no historical stats on whether or not that max gets reached… When we first deployed uwsgi/gevent not only did we experience the Watson errors but began to see connection drops once reaching peak requests/min. So in that case it’s entirely possible that Postgres became the bottleneck here…

It looks like I was wrong in thinking that it was a good idea bringing the async model over from the node/mongo/nosql world - mongo etc don’t seem to suffer the same issues with max. connections - likely due to following the async model themselves?

Really appreciate the advice Dave - we definitely got beyond the scope of the original issue here! :D I strongly suspect we’ll take a look into using waitress or something similar given that we won’t be moving from Heroku anytime soon - at least until an easier way of applying async to the current architecture is apparent

etianen commented 9 years ago

It’s currently here: https://github.com/etianen/aiohttp-wsgi The functionality is all present, but a nice wrapper script to “just" serve a wsgi app still needs to be written.

I think that the node driver for mongo uses a connection pool that’s shared between client browser connections, avoiding connection overload. That’s the approach that pure async postgres libraries for Python also take, and it works great! The problem is that Django is inherently a threaded framework, which is part of the reason it’s so damn easy to use! What could be easier than one db connection per thread?

In my current project I’m using my new asyncio wsgi handler to run Django requests in a thread pool, while still supporting asyncweb sockets in the same process and on the same port. Keeping as much of the app as possible threaded makes sense to me, as it keeps the behaviour very predictable. The bits that have to be written using async paradigms get very careful attention and are kept as small as possible!

Happy coding! :D

On 29 Jan 2015, at 14:26, Michael Ford notifications@github.com wrote:

On the alternative you're working on - very cool, do keep me informed on that! Is there a repo or something that I can watch?

On postgres etc, you’re right… that’s worrying. Unfortunately Heroku only provides current stats and no historical stats on whether or not that max gets reached… When we first deployed uwsgi/gevent not only did we experience the Watson errors but began to see connection drops once reaching peak requests/min. So in that case it’s entirely possible that Postgres became the bottleneck here…

It looks like I was wrong in thinking that it was a good idea bringing the async model over from the node/mongo/nosql world - mongo etc don’t seem to suffer the same issues with max. connections - likely due to following the async model themselves?

Really appreciate the advice Dave - we definitely got beyond the scope of the original issue here! :D I strongly suspect we’ll take a look into using waitress or something similar given that we won’t be moving from Heroku anytime soon - at least until an easier way applying async is apparent

— Reply to this email directly or view it on GitHub.

mtford90 commented 9 years ago

Right you are - will certainly keep all that in mind going forward.

Thanks again and same to you!