bihealth / sodar-core

SODAR Core: A Django-based framework for building scientific data management web apps
MIT License
9 stars 1 forks source link

Database deadlock/flush errors in UI tests with parallel testing #1428

Closed mikkonie closed 2 months ago

mikkonie commented 3 months ago

Parallel and UI testing, the good old sources of problems, are at it again. When testing the site locally I've gotten random deadlock/sqlflush errors from UI. Dump of an error in comments.

In short, we get psycopg2.errors.DeadlockDetected: deadlock detected with django.core.management.base.CommandError: Database test_sodar_core_6 couldn't be flushed.

I first suspected this to be a problem with TestProjectSidebar tests, but this problem can appear with different tests depending on which part of the unit test suite we run.

It only happens in parallel testing, never in CI. Also, locally I'm unable to get these errors to appear if I remove --parallel from the test command.

My guesses for possible sources for the problem:

The latter of the two is what has had some changes recently, so my first idea would be to look into the DB settings..

It's also questionable if we want to use the notoriously shoddy parallel testing to begin with. It doesn't really matter how long running a large number of tests takes: it's something one does in the background (or CI) anyway while working on something else. At least that's how it works for me..

In any case, this needs to be looked into in case it's e.g. a serious database configuration issue.

mikkonie commented 3 months ago

Example:

======================================================================
ERROR: test_create_link (projectroles.tests.test_ui.TestProjectSidebar.test_create_link)
Test visibility of create link
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
    ^^^^^^^^^^^^^^^^^
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL:  Process 1222 waits for AccessExclusiveLock on relation 2012526 of database 2012657; blocked by process 2017.
Process 2017 waits for AccessShareLock on relation 2012010 of database 2012657; blocked by process 1222.
HINT:  See server log for query details.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/core/management/commands/flush.py", line 73, in handle
    connection.ops.execute_sql_flush(sql_list)
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/backends/base/operations.py", line 451, in execute_sql_flush
    cursor.execute(sql)
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
    ^^^^^^^^^^^^^^^^^
django.db.utils.OperationalError: deadlock detected
DETAIL:  Process 1222 waits for AccessExclusiveLock on relation 2012526 of database 2012657; blocked by process 2017.
Process 2017 waits for AccessShareLock on relation 2012010 of database 2012657; blocked by process 1222.
HINT:  See server log for query details.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/test/testcases.py", line 419, in _setup_and_call
    self._post_teardown()
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/test/testcases.py", line 1279, in _post_teardown
    self._fixture_teardown()
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/test/testcases.py", line 1313, in _fixture_teardown
    call_command(
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/core/management/__init__.py", line 194, in call_command
    return command.execute(*args, **defaults)
      ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
    ^^^^^^^^^^^^^^^^^
  File "/home/mikkopen/.virtualenvs/sodar_core_django4/lib/python3.11/site-packages/django/core/management/commands/flush.py", line 75, in handle
    raise CommandError(
    ^^^^^^^^^^^^^^^^^
django.core.management.base.CommandError: Database test_sodar_core_6 couldn't be flushed. Possible reasons:
  * The database isn't running or isn't configured correctly.
  * At least one of the expected database tables doesn't exist.
  * The SQL was invalid.
Hint: Look at the output of 'django-admin sqlflush'. That's the SQL this command wasn't able to run.
mikkonie commented 2 months ago

It seems setting DATABASES['default']['ATOMIC_REQUESTS'] = False in test settings fixes this. Not sure if it's the best course to take, but considering this fixed for now. If it turns out this causes some other issues we can re-visit this problem later.