aptiko / irma

Project management for IRMA—no code, mainly issues
0 stars 0 forks source link

Do not use Django transport for Celery #23

Closed aptiko closed 5 years ago

aptiko commented 5 years ago

Celery sometimes stops processing and needs restarting:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/celery/worker/__init__.py", line 206, in start
    self.blueprint.start(self)
  File "/usr/lib/python2.7/dist-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/lib/python2.7/dist-packages/celery/bootsteps.py", line 374, in start
    return self.obj.start()
  File "/usr/lib/python2.7/dist-packages/celery/worker/consumer.py", line 279, in start
    blueprint.start(self)
  File "/usr/lib/python2.7/dist-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/lib/python2.7/dist-packages/celery/worker/consumer.py", line 838, in start
    c.loop(*c.loop_args())
  File "/usr/lib/python2.7/dist-packages/celery/worker/loops.py", line 103, in synloop
    connection.drain_events(timeout=2.0)
  File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 275, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/lib/python2.7/dist-packages/kombu/transport/virtual/__init__.py", line 830, in drain_events
    item, channel = get(timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/kombu/transport/virtual/scheduling.py", line 39, in get
    return self.fun(resource, **kwargs), resource
  File "/usr/lib/python2.7/dist-packages/kombu/transport/virtual/__init__.py", line 850, in _drain_channel
    return channel.drain_events(timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/kombu/transport/virtual/__init__.py", line 642, in drain_events
    return self._poll(self.cycle, timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/kombu/transport/virtual/__init__.py", line 328, in _poll
    return cycle.get()
  File "/usr/lib/python2.7/dist-packages/kombu/transport/virtual/scheduling.py", line 39, in get
    return self.fun(resource, **kwargs), resource
  File "/usr/lib/python2.7/dist-packages/kombu/transport/django/__init__.py", line 49, in _get
    m = self.Queue.objects.fetch(queue)
  File "/usr/lib/python2.7/dist-packages/kombu/transport/django/managers.py", line 33, in fetch
    queue = self.get(name=queue_name)
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 328, in get
    num = len(clone)
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 144, in __len__
    self._fetch_all()
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 965, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 238, in iterator
    results = compiler.execute_sql()
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/utils.py", line 98, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/usr/local/aira-virtualenv/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
OperationalError: terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
aptiko commented 5 years ago

Apparently during the problem I upgraded the system to Debian 9.9. As part of this upgrade PostgreSQL restarted. Celery did not automatically reconnect and resume after PostgreSQL was up again. This is probably a celery issue.

Unfortunately the Celery django database backend is not well supported, so to solve the problem we probably need to move to RabbitMQ or similar.

aptiko commented 5 years ago

Old versions of kombu do not support Django 2, and newer ones have entirely removed Django support. We need to fix this before upgrading to Django 2.

aptiko commented 5 years ago

Fixed in openmeteo/aira@e40ad30c1facdbd140e6313fcd9c2b609213a792.