WeblateOrg / weblate

Web based localization tool with tight version control integration.
https://weblate.org/
GNU General Public License v3.0
4.63k stars 1.02k forks source link

Failure while executing task: OperationalError: deadlock detected #5279

Closed Kingdutch closed 3 years ago

Kingdutch commented 3 years ago

Describe the issue

We had a 4.3.1 instance that occasionally crashed on PO import causing the redis queue to go down with it and be lost. This problem has resolved itself after upgrading to 4.4.2. To update all our components we had to re-import PO files for quite a lot of components using weblate loadpo --all --lang <code>. This produced about ~270 celery tasks in the queue. We kept an eye on the log to see whether our earlier problem was resolved. In the log we saw the following deadlock error multiple times.

The error does not actually cause any problems we can find within Weblate, but since it's what looks like a fatal error that was not yet reported on GitHub I wanted to report it for your information.

I already tried

Describe the steps you tried to solve the problem yourself.

To Reproduce the issue

Steps to reproduce the behavior:

  1. Run weblate loadpo --all --lang <langcode> form multiple languages
  2. See the error in

Expected behavior

No errors in the logs

Exception traceback

celery-main stderr | [2021-01-27 10:56:50,426: ERROR/1848] Failure while executing task: OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/ForkPoolWorker-5] Failure while executing task: OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/1848] Failure while executing task: OperationalError: {'signal': <Signal: task_failure providing_args={'kwargs', 'traceback', 'exception', 'einfo', 'args', 'task_id'}>, 'sender': <@task: weblate.trans.tasks.perform_load of weblate at 0x7ff99ba17860>, 'task_id': '81b245f9-cb65-4c25-ae7d-b9206a07c307', 'args': [213], 'kwargs': {'force': False, 'langs': ['bg']}, 'traceback': <traceback object at 0x7ff98bd7ff08>, 'einfo': <ExceptionInfo: OperationalError('deadlock detected\nDETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.\nProcess 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.\nHINT:  See server log for query details.\nCONTEXT:  while locking tuple (7881,1) in relation "trans_unit"\nSQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"\n')>}
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/ForkPoolWorker-5] Failure while executing task: OperationalError: {'signal': <Signal: task_failure providing_args={'kwargs', 'traceback', 'exception', 'einfo', 'args', 'task_id'}>, 'sender': <@task: weblate.trans.tasks.perform_load of weblate at 0x7ff99ba17860>, 'task_id': '81b245f9-cb65-4c25-ae7d-b9206a07c307', 'args': [213], 'kwargs': {'force': False, 'langs': ['bg']}, 'traceback': <traceback object at 0x7ff98bd7ff08>, 'einfo': <ExceptionInfo: OperationalError('deadlock detected\nDETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.\nProcess 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.\nHINT:  See server log for query details.\nCONTEXT:  while locking tuple (7881,1) in relation "trans_unit"\nSQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"\n')>}
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/1848] Failure while executing task
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | psycopg2.errors.DeadlockDetected: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr |
celery-main stderr | The above exception was the direct cause of the following exception:
celery-main stderr |
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 405, in trace_task
celery-main stderr |     R = retval = fun(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 697, in __protected_call__
celery-main stderr |     return self.run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/autoretry.py", line 34, in run
celery-main stderr |     return task._orig_run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/tasks.py", line 86, in perform_load
celery-main stderr |     force=force, langs=langs, changed_template=changed_template, from_link=from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1751, in create_translations
celery-main stderr |     force, langs, request, changed_template, from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1875, in _create_translations
celery-main stderr |     self.progress_step()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/transaction.py", line 232, in __exit__
celery-main stderr |     connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/utils/asyncio.py", line 26, in inner
celery-main stderr |     return func(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 266, in commit
celery-main stderr |     self._commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in __exit__
celery-main stderr |     raise dj_exc_value.with_traceback(traceback) from exc_value
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | django.db.utils.OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr | [2021-01-27 10:56:50,426: ERROR/ForkPoolWorker-5] Failure while executing task
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | psycopg2.errors.DeadlockDetected: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |
celery-main stderr |
celery-main stderr | The above exception was the direct cause of the following exception:
celery-main stderr |
celery-main stderr | Traceback (most recent call last):
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 405, in trace_task
celery-main stderr |     R = retval = fun(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 697, in __protected_call__
celery-main stderr |     return self.run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/celery/app/autoretry.py", line 34, in run
celery-main stderr |     return task._orig_run(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/tasks.py", line 86, in perform_load
celery-main stderr |     force=force, langs=langs, changed_template=changed_template, from_link=from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1751, in create_translations
celery-main stderr |     force, langs, request, changed_template, from_link
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/component.py", line 1875, in _create_translations
celery-main stderr |     self.progress_step()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/transaction.py", line 232, in __exit__
celery-main stderr |     connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/utils/asyncio.py", line 26, in inner
celery-main stderr |     return func(*args, **kwargs)
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 266, in commit
celery-main stderr |     self._commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in __exit__
celery-main stderr |     raise dj_exc_value.with_traceback(traceback) from exc_value
celery-main stderr |   File "/usr/local/lib/python3.7/dist-packages/django/db/backends/base/base.py", line 242, in _commit
celery-main stderr |     return self.connection.commit()
celery-main stderr | django.db.utils.OperationalError: deadlock detected
celery-main stderr | DETAIL:  Process 9532 waits for ShareLock on transaction 2185973; blocked by process 13265.
celery-main stderr | Process 13265 waits for ShareLock on transaction 2183699; blocked by process 9532.
celery-main stderr | HINT:  See server log for query details.
celery-main stderr | CONTEXT:  while locking tuple (7881,1) in relation "trans_unit"
celery-main stderr | SQL statement "SELECT 1 FROM ONLY "public"."trans_unit" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
celery-main stderr |

Other instances of this error showed different process numbers, forkpoolworker number and tuple (x,y) values but were otherwise identical.

Server configuration and status

Weblate installation: Docker

 * Weblate: 4.4.2
 * Django: 3.1.5
 * siphashc: 2.1
 * Whoosh: 2.7.4
 * translate-toolkit: 3.3.0
 * lxml: 4.6.2
 * Pillow: 8.1.0
 * bleach: 3.2.1
 * python-dateutil: 2.8.1
 * social-auth-core: 4.0.3
 * social-auth-app-django: 4.0.0
 * django-crispy-forms: 1.10.0
 * oauthlib: 3.1.0
 * django-compressor: 2.4
 * djangorestframework: 3.12.2
 * django-filter: 2.4.0
 * django-appconf: 1.0.4
 * user-agents: 2.2.0
 * filelock: 3.0.12
 * setuptools: 40.8.0
 * jellyfish: 0.8.2
 * openpyxl: 3.0.5
 * celery: 5.0.5
 * kombu: 5.0.2
 * translation-finder: 2.7
 * weblate-language-data: 2020.13
 * html2text: 2020.1.16
 * pycairo: 1.16.2
 * pygobject: 3.30.4
 * diff-match-patch: 20200713
 * requests: 2.25.1
 * django-redis: 4.12.1
 * hiredis: 1.1.0
 * sentry_sdk: 0.19.5
 * Cython: 0.29.21
 * misaka: 2.1.1
 * GitPython: 3.1.12
 * borgbackup: 1.1.15
 * pyparsing: 2.4.7
 * Python: 3.7.3
 * Git: 2.20.1
 * psycopg2: 2.8.6
 * psycopg2-binary: 2.8.6
 * phply: 1.2.5
 * chardet: 4.0.0
 * ruamel.yaml: 0.16.12
 * tesserocr: 2.5.1
 * akismet: 1.1
 * boto3: 1.16.54
 * zeep: 4.0.0
 * aeidon: 1.9
 * iniparse: 0.5
 * mysqlclient: 2.0.3
 * Mercurial: 5.6.1
 * git-svn: 2.20.1
 * git-review: 1.28.0
 * Redis server: 4.0.14
 * PostgreSQL server: 11.10
 * Database backends: django.db.backends.postgresql
 * Cache backends: default:RedisCache, avatar:FileBasedCache
 * Email setup: django.core.mail.backends.smtp.EmailBackend: smtp.sendgrid.net
 * OS encoding: filesystem=utf-8, default=utf-8
 * Celery: redis://cache:6379/1, redis://cache:6379/1, regular
 * Platform: Linux 4.14.158-129.185.amzn2.x86_64 (x86_64)

Weblate deploy checks

System check identified some issues:

INFOS:
?: (weblate.I021) Error collection is not set up, it is highly recommended for production use
    HINT: https://docs.weblate.org/en/weblate-4.4.2/admin/install.html#collecting-errors
?: (weblate.I028) Backups are not configured, it is highly recommended for production use
    HINT: https://docs.weblate.org/en/weblate-4.4.2/admin/backup.html

System check identified 2 issues (1 silenced).
nijel commented 3 years ago

I've already seen this with one customer, but unfortunately, we didn't find the cause there yet...

This can probably sometimes happen when multiple updates are running in parallel, and your description seems to match that. I still wonder why we've never seen this on our production servers...

The proper long-term solution for this is to avoid holding key lock for updates, but that will be possible after Django 3.2 is released, see https://github.com/WeblateOrg/weblate/pull/5227

Kingdutch commented 3 years ago

Glad to hear it's a known issue :) I'm not entirely sure what it's locking on. Could it have something to do with components that share a repository (maybe they also share other things)? We have quite a few of those.

nijel commented 3 years ago

No, it's lock on the units table. Weblate holds it to avoid concurrent updates of the units, but in current Django implementation it doesn't hold only row level lock, but also primary key lock. And that blocks insertion into that table for that time.

nijel commented 3 years ago

Should be addressed in Weblate 4.6 with Django 3.2.

github-actions[bot] commented 3 years ago

Thank you for your report; the issue you have reported has just been fixed.