etianen / django-s3-storage

Django Amazon S3 file storage.
BSD 3-Clause "New" or "Revised" License
412 stars 94 forks source link

Frequent time-outs / failures in `storage.save` used with gevent #162

Closed scottgigante closed 1 week ago

scottgigante commented 1 month ago

I'm using django-health-check with django-s3-storage and getting frequent (like, once every few hours) failures. This may be related to my threading setup (I'm using gevent) but it's causing my container to frequently fail health checks. This began when I set AWS_S3_CONNECT_TIMEOUT in my settings.py, but previously I had unexplained timeouts instead of exceptions on the healthcheck, so I suspect it's the same issue with and without the connect timeout.

I know this is hard to repro, but any ideas?

  | 2024-07-23T09:34:18.523-04:00 | Traceback (most recent call last):
-- | -- | --
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/health_check/storage/backends.py", line 63, in check_status
  | 2024-07-23T09:34:18.523-04:00 | file_name = self.check_save(file_name, file_content)
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/health_check/storage/backends.py", line 42, in check_save
  | 2024-07-23T09:34:18.523-04:00 | file_name = storage.save(file_name, ContentFile(content=file_content))
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/django/core/files/storage/base.py", line 38, in save
  | 2024-07-23T09:34:18.523-04:00 | name = self._save(name, content)
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/django_s3_storage/storage.py", line 37, in _do_wrap_errors
  | 2024-07-23T09:34:18.523-04:00 | return func(self, name, *args, **kwargs)
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/django_s3_storage/storage.py", line 386, in _save
  | 2024-07-23T09:34:18.523-04:00 | self.s3_connection.upload_fileobj(
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/boto3/s3/inject.py", line 642, in upload_fileobj
  | 2024-07-23T09:34:18.523-04:00 | return future.result()
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/futures.py", line 103, in result
  | 2024-07-23T09:34:18.523-04:00 | return self._coordinator.result()
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/futures.py", line 266, in result
  | 2024-07-23T09:34:18.523-04:00 | raise self._exception
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/tasks.py", line 269, in _main
  | 2024-07-23T09:34:18.523-04:00 | self._submit(transfer_future=transfer_future, **kwargs)
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/upload.py", line 597, in _submit
  | 2024-07-23T09:34:18.523-04:00 | self._submit_upload_request(
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/upload.py", line 632, in _submit_upload_request
  | 2024-07-23T09:34:18.523-04:00 | self._transfer_coordinator.submit(
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/futures.py", line 323, in submit
  | 2024-07-23T09:34:18.523-04:00 | future = executor.submit(task, tag=tag)
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/s3transfer/futures.py", line 474, in submit
  | 2024-07-23T09:34:18.523-04:00 | future = ExecutorFuture(self._executor.submit(task))
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 162, in submit
  | 2024-07-23T09:34:18.523-04:00 | with self._shutdown_lock, _global_shutdown_lock:
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_semaphore.py", line 282, in gevent._gevent_c_semaphore.Semaphore.__enter__
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_semaphore.py", line 283, in gevent._gevent_c_semaphore.Semaphore.__enter__
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_semaphore.py", line 184, in gevent._gevent_c_semaphore.Semaphore.acquire
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/gevent/thread.py", line 112, in acquire
  | 2024-07-23T09:34:18.523-04:00 | acquired = BoundedSemaphore.acquire(self, blocking, timeout)
  | 2024-07-23T09:34:18.523-04:00 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_semaphore.py", line 184, in gevent._gevent_c_semaphore.Semaphore.acquire
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_semaphore.py", line 253, in gevent._gevent_c_semaphore.Semaphore.acquire
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  | 2024-07-23T09:34:18.523-04:00 | File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
  | 2024-07-23T09:34:18.523-04:00 | AttributeError: 'NoneType' object has no attribute 'switch'
  | 2024-07-23T09:34:18.523-04:00 | The above exception was the direct cause of the following exception:
  | 2024-07-23T09:34:18.523-04:00 | Traceback (most recent call last):
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/health_check/backends.py", line 30, in run_check
  | 2024-07-23T09:34:18.523-04:00 | self.check_status()
  | 2024-07-23T09:34:18.523-04:00 | File "/usr/local/lib/python3.11/site-packages/health_check/storage/backends.py", line 67, in check_status
  | 2024-07-23T09:34:18.523-04:00 | raise ServiceUnavailable("Unknown exception") from e
  | 2024-07-23T09:34:18.523-04:00 | health_check.exceptions.ServiceUnavailable: unavailable: Unknown exception
etianen commented 1 month ago

I'm afraid that this looks like a nasty interaction between gevent and boto3.

https://github.com/boto/boto3/issues/3141

I don't think there's much can be done here.

You could experiment with not setting use_threads=True. If that works, I'd be willing to take a PR that exposes this as a config setting.

scottgigante commented 1 month ago

Ah you're a lifesaver!! Thank you. This solved my issue. Turns out you already expose it as AWS_S3_USE_THREADS, and setting that to False in settings.py resolved the timeouts. Would it be worth adding a comment to the README on this? Or is the existence of this issue sufficient?

etianen commented 1 month ago

A short comment on the README next to the setting would be a good idea. Happy to take that PR.

On Fri, 9 Aug 2024 at 15:32, Scott Gigante @.***> wrote:

Ah you're a lifesaver!! Thank you. This solved my issue. Turns out you already expose it as AWS_S3_USE_THREADS, and setting that to False in settings.py resolved the timeouts. Would it be worth adding a comment to the README on this? Or is the existence of this issue sufficient?

— Reply to this email directly, view it on GitHub https://github.com/etianen/django-s3-storage/issues/162#issuecomment-2278086115, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCCFSCMI6LRZT3CG47LZQTHF5AVCNFSM6AAAAABLKUIFKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZYGA4DMMJRGU . You are receiving this because you commented.Message ID: @.***>