denshoproject / encyc-psms

back-end data for the encyclopedia
Other
1 stars 0 forks source link

Intermittent 502 errors #19

Open gjost opened 4 years ago

gjost commented 4 years ago

Index (https://psms.densho.org/) works fine as does login screen, but you get a 502 after clicking login. It is sometimes possible to view the Django admin after this but there are intermittent 502 errors. Something is wrong on the back end.

gjost commented 4 years ago

Too many open files. Maybe the upload handler is not closing after uploads are finished?

2020-11-01 22:41:30,589 ERROR    [log.log_response]  Internal Server Error: /api/2.0/sources/en-denshopd-i226-00018-1
Traceback (most recent call last):
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/handlers/base.py", line 179, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/views/generic/base.py", line 70, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/rest_framework/views.py", line 505, in dispatch
    response = self.handle_exception(exc)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/rest_framework/views.py", line 465, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
    raise exc
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/rest_framework/views.py", line 502, in dispatch
    response = handler(request, *args, **kwargs)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/rest_framework/decorators.py", line 50, in handler
    return func(*args, **kwargs)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/utils/decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "/opt/encyc-psms/psms/psms/api.py", line 45, in sources
    Source.sources(encyclopedia_ids)
  File "/opt/encyc-psms/psms/sources/models.py", line 494, in sources
    encyclopedia_id__in=encyclopedia_ids
  File "/opt/encyc-psms/psms/sources/models.py", line 493, in <listcomp>
    for source in Source.objects.filter(
  File "/opt/encyc-psms/psms/sources/models.py", line 181, in dict
    o['original_size'] = self.original.file.size
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/db/models/fields/files.py", line 43, in _get_file
    self._file = self.storage.open(self.name, 'rb')
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/files/storage.py", line 36, in open
    return self._open(name, mode)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/files/storage.py", line 231, in _open
    return File(open(self.path(name), mode))
OSError: [Errno 24] Too many open files: '/var/www/encycpsms/media/sources/1/1365/en-denshopd-i226-00018-1.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/utils/deprecation.py", line 114, in __call__
    response = response or self.get_response(request)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/handlers/exception.py", line 49, in inner
    response = response_for_exception(request, exc)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
    response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/views/debug.py", line 52, in technical_500_response
    html = reporter.get_traceback_html()
  File "/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/views/debug.py", line 328, in get_traceback_html
    with Path(CURRENT_DIR, 'templates', 'technical_500.html').open(encoding='utf-8') as fh:
  File "/usr/lib/python3.7/pathlib.py", line 1186, in open
    opener=self._opener)
  File "/usr/lib/python3.7/pathlib.py", line 1039, in _opener
    return self._accessor.open(self, flags, mode)
OSError: [Errno 24] Too many open files: '/opt/encyc-psms/venv/psms/lib/python3.7/site-packages/django/views/templates/technical_500.html'

During handling of the above exception, another exception occurred:
...
gjost commented 4 years ago

When I ran sudo supervisorctl restart encycpsms it restarted and now seems fine.

Looks like an opened file is not getting closed somewhere in mwclient. The file is opened in sources.wiki.upload_file[1], which calls mwclient.client.upload. It is supposed to be closed at the end of that method[2] but there are a number of ways the method can exit without calling file.close(). [1] https://github.com/denshoproject/encyc-psms/blob/master/psms/sources/wiki.py#L56 [2] https://github.com/mwclient/mwclient/blob/master/mwclient/client.py#L607

Maybe just increase the limit on open files (ulimit?) and have a cron job that restarts the app every night?

gjost commented 4 years ago

Addressed (not really fixed) in ansible-colo commit 630b124 which adds a cron task that restarts the app every day.

gjost commented 3 years ago

Issue is not fixed. PSMS API consistently returns 502: https://psms.densho.org/api/2.0/sources/

GeoffFroh commented 3 years ago

Let's fix temporarily with a cron job that restarts the PSMS guest VM once per day at 2am.

gjost commented 3 years ago

I no longer think that mwclient.upload called in sources.wiki.upload_file is the problem. mwclient.upload is called within a with open() block; the file pointer is passed to mwclient.upload. This means that the file is automatically closed however that function exits.

I now think it's caused by some code in https://psms.densho.org/api/2.0/sources/. The API's timeline and facility views return 200 when the app is started, but 502 after the /sources/ view is run. Supporting evidence: running sudo lsof -u encyc|less after requesting that view reveals lots of open binary files (jpgs, etc). It looks like that view is opening all those files.

gjost commented 3 years ago

Fixed as of encyc-psms commit da6d766, merged in to master and pulled in production.

Problem turned out to be a line in the Source.dict() method where the original file was opened to read the file size but not closed. Fix was to write size to the database on initial import. Files sizes were added for existing files.