bookwyrm-social / bookwyrm

Social reading and reviewing, decentralized with ActivityPub
http://joinbookwyrm.com/
Other
2.22k stars 260 forks source link

Investigate using gevent gunicorn workers #2731

Open WesleyAC opened 1 year ago

WesleyAC commented 1 year ago

By default, we use synchronous gunicorn workers, which, given #2717, causes performance problems under load. Using gevent workers seemed to fix this (from testing on bookwyrm.social with gunicorn bookwyrm.wsgi:application --worker-class gevent --worker-connections 2048 --workers 18 --preload --max-requests 300 --max-requests-jitter 300 --bind 0.0.0.0:8000), but leads to the following problem:

/usr/local/lib/python3.9/site-packages/gunicorn/workers/ggevent.py:53: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['aiohttp.web (/usr/local/lib/python3.9/site-packages/aiohttp/web.py)', 'urllib3.util.ssl_ (/usr/local/lib/python3.9/site-packages/urllib3/util/ssl_.py)', 'aiohttp.client_reqrep (/usr/local/lib/python3.9/site-packages/aiohttp/client_reqrep.py)', 'botocore.httpsession (/usr/local/lib/python3.9/site-packages/botocore/httpsession.py)', 'aiohttp.connector (/usr/local/lib/python3.9/site-packages/aiohttp/connector.py)', 'aiohttp.worker (/usr/local/lib/python3.9/site-packages/aiohttp/worker.py)', 'aiohttp.web_runner (/usr/local/lib/python3.9/site-packages/aiohttp/web_runner.py)', 'aiohttp.client_exceptions (/usr/local/lib/python3.9/site-packages/aiohttp/client_exceptions.py)', 'aiohttp.client (/usr/local/lib/python3.9/site-packages/aiohttp/client.py)', 'urllib3.util (/usr/local/lib/python3.9/site-packages/urllib3/util/__init__.py)'].

which indeed causes many RecursionErrors in practice:

2023-03-11T01:56:32.462805091Z Internal Server Error: /inbox
2023-03-11T01:56:32.462854037Z Traceback (most recent call last):
2023-03-11T01:56:32.462860283Z   File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
2023-03-11T01:56:32.462866222Z     response = get_response(request)
2023-03-11T01:56:32.462871362Z   File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
2023-03-11T01:56:32.462921429Z     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2023-03-11T01:56:32.462927079Z   File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/django/views.py", line 67, in sentry_wrapped_callback
2023-03-11T01:56:32.462932540Z     return callback(request, *args, **kwargs)
2023-03-11T01:56:32.462937391Z   File "/usr/local/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
2023-03-11T01:56:32.462942795Z     return self.dispatch(request, *args, **kwargs)
2023-03-11T01:56:32.462947737Z   File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 43, in _wrapper
2023-03-11T01:56:32.462952878Z     return bound_method(*args, **kwargs)
2023-03-11T01:56:32.462957602Z   File "/usr/local/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
2023-03-11T01:56:32.462962793Z     return view_func(*args, **kwargs)
2023-03-11T01:56:32.462981400Z   File "/usr/local/lib/python3.9/site-packages/django/views/generic/base.py", line 98, in dispatch
2023-03-11T01:56:32.462986780Z     return handler(request, *args, **kwargs)
2023-03-11T01:56:32.462991646Z   File "/app/bookwyrm/views/inbox.py", line 67, in post
2023-03-11T01:56:32.462996674Z     sometimes_async_activity_task(activity_json, queue=priority)
2023-03-11T01:56:32.463001896Z   File "/app/bookwyrm/views/inbox.py", line 112, in sometimes_async_activity_task
2023-03-11T01:56:32.463007259Z     activity.action(allow_external_connections=False)
2023-03-11T01:56:32.463012077Z   File "/app/bookwyrm/activitypub/verbs.py", line 233, in action
2023-03-11T01:56:32.463017062Z     self.to_model(allow_external_connections=allow_external_connections)
2023-03-11T01:56:32.463021865Z   File "/app/bookwyrm/activitypub/base_activity.py", line 130, in to_model
2023-03-11T01:56:32.463026904Z     and model.ignore_activity(self)
2023-03-11T01:56:32.463031927Z   File "/app/bookwyrm/models/status.py", line 121, in ignore_activity
2023-03-11T01:56:32.463036937Z     boosted = activitypub.resolve_remote_id(activity.object, get_activity=True)
2023-03-11T01:56:32.463042118Z   File "/app/bookwyrm/activitypub/base_activity.py", line 330, in resolve_remote_id
2023-03-11T01:56:32.463047197Z     data = get_data(remote_id)
2023-03-11T01:56:32.463052010Z   File "/app/bookwyrm/connectors/abstract_connector.py", line 232, in get_data
2023-03-11T01:56:32.463057370Z     resp = requests.get(
2023-03-11T01:56:32.463063416Z   File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 73, in get
2023-03-11T01:56:32.463068592Z     return request("get", url, params=params, **kwargs)
2023-03-11T01:56:32.463073648Z   File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59, in request
2023-03-11T01:56:32.463079005Z     return session.request(method=method, url=url, **kwargs)
2023-03-11T01:56:32.463091280Z   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
2023-03-11T01:56:32.463096884Z     resp = self.send(prep, **send_kwargs)
2023-03-11T01:56:32.463101817Z   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
2023-03-11T01:56:32.463106826Z     r = adapter.send(request, **kwargs)
2023-03-11T01:56:32.463111530Z   File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
2023-03-11T01:56:32.463116522Z     resp = conn.urlopen(
2023-03-11T01:56:32.463121303Z   File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
2023-03-11T01:56:32.463126328Z     httplib_response = self._make_request(
2023-03-11T01:56:32.463131310Z   File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
2023-03-11T01:56:32.463136675Z     self._validate_conn(conn)
2023-03-11T01:56:32.463141601Z   File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
2023-03-11T01:56:32.463146707Z     conn.connect()
2023-03-11T01:56:32.463151372Z   File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 395, in connect
2023-03-11T01:56:32.463156425Z     self.ssl_context = create_urllib3_context(
2023-03-11T01:56:32.463161247Z   File "/usr/local/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 312, in create_urllib3_context
2023-03-11T01:56:32.463166789Z     context.options |= options
2023-03-11T01:56:32.463171610Z   File "/usr/local/lib/python3.9/ssl.py", line 602, in options
2023-03-11T01:56:32.463176949Z     super(SSLContext, SSLContext).options.__set__(self, value)
2023-03-11T01:56:32.463181857Z   File "/usr/local/lib/python3.9/ssl.py", line 602, in options
2023-03-11T01:56:32.463187057Z     super(SSLContext, SSLContext).options.__set__(self, value)
2023-03-11T01:56:32.463191676Z   File "/usr/local/lib/python3.9/ssl.py", line 602, in options
2023-03-11T01:56:32.463196603Z     super(SSLContext, SSLContext).options.__set__(self, value)
2023-03-11T01:56:32.463201665Z   [Previous line repeated 448 more times]
2023-03-11T01:56:32.463206463Z RecursionError: maximum recursion depth exceeded while calling a Python object

It seems that any request making a outgoing HTTPS connection (most of which are described in #2717) caused this issue, resulting in a 500 error. I think #2730 and #2729 may have been instances of this problem (although #2729 I'm less sure about, since it's unclear to me if uploading a image would cause this same problem).

We should figure out how to get the import/monkeypatching order correct to avoid this issue, so as to enable the more efficient for our usecase gevent workers.

Apologies for the disruption caused by this — I've been pretty aggressive in making changes in prod in order to try to get bookwyrm.social running more stably, but I should've tested this change a little more carefully.

WesleyAC commented 1 year ago

Some discussion in https://github.com/benoitc/gunicorn/issues/1056 suggests that --preload may be part of the problem here, which would make sense. I was using preload to cut down on memory, since we're generally flying pretty close to the sun on memory usage, but potentially I could reduce the number of workers and still get better performance with gevent than with sync workers.

WesleyAC commented 1 year ago

Another relevant gunicorn issue: https://github.com/benoitc/gunicorn/issues/1566. Seems highly likely that this is triggered by using --preload, and gevent without preload would work fine.

WesleyAC commented 1 year ago

In case it's useful, here's the current (gthread) gunicorn config that I'm using on bookwyrm.social: gunicorn bookwyrm.wsgi:application --workers 6 --threads 5 --preload --max-requests 100 --max-requests-jitter 100 --bind 0.0.0.0:8000

It's tuned to the particular load, memory, and CPU constraints that we have, but it seems to be fairly stable.