Better-HPC / keystone-api

Allocation management API for HPC systems.
0 stars 0 forks source link

Database will randomly lock when running tests #400

Closed djperrefort closed 1 week ago

djperrefort commented 2 weeks ago

The function tests will randomly fail with the error django.db.utils.OperationalError: database table is locked. This problem started showing up a few weeks ago. I've only seen it in CI and have been unable to reproduce it locally. I'm unsure if that is due to the sporadic nature of the error, or the CI environment.

Traceback:

Loaded image: keystone-api:latest
Creating test database for alias 'default'...
Found 104 test(s).
System check identified no issues (0 silenced).
.................................................................................................E......
======================================================================
ERROR: test_staff_user_permissions (tests.health.test.EndpointPermissions.test_staff_user_permissions)
Test staff users have read-only permissions.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 277, in connect
    sock = self.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 62, in call_with_retry
    return do()
           ^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 278, in <lambda>
    lambda: self._connect(), lambda error: self.disconnect(error)
            ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 639, in _connect
    raise err
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 627, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/health_check/contrib/redis/backends.py", line 25, in check_status
    conn.ping()  # exceptions may be raised upon ping
    ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/commands/core.py", line 1208, in ping
    return self.execute_command("PING", **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 545, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1074, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 283, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 127.0.0.1:6379. Connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 329, in execute
    return super().execute(query, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: database table is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/keystone_api/tests/health/test.py", line 55, in test_staff_user_permissions
    self.assert_read_only_responses()
  File "/usr/local/lib/python3.11/site-packages/keystone_api/tests/health/test.py", line 29, in assert_read_only_responses
    self.assertIn(self.client.head(self.endpoint).status_code, self.valid_responses)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/test/client.py", line 1097, in head
    response = super().head(
               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/test/client.py", line 503, in head
    return self.generic(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rest_framework/test.py", line 233, in generic
    return super().generic(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/test/client.py", line 617, in generic
    return self.request(**r)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rest_framework/test.py", line 285, in request
    return super().request(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rest_framework/test.py", line 237, in request
    request = super().request(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/test/client.py", line 1013, in request
    self.check_exception(response)
  File "/usr/local/lib/python3.11/site-packages/django/test/client.py", line 743, in check_exception
    raise exc_value
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 65, in _view_wrapper
    return view_func(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/views/generic/base.py", line 104, in view
    return self.dispatch(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/keystone_api/apps/health/views.py", line 65, in get
    return super().get(request, *args, **kwargs)  # pragma: nocover
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 48, in _wrapper
    return bound_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 188, in _view_wrapper
    result = _process_exception(request, e)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/decorators.py", line 186, in _view_wrapper
    response = view_func(request, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/keystone_api/apps/health/views.py", line 32, in get
    self.check()
  File "/usr/local/lib/python3.11/site-packages/health_check/mixins.py", line 23, in check
    return self.run_check(subset=subset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/health_check/mixins.py", line 88, in run_check
    for plugin in executor.map(_run, plugin_instances):
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/health_check/mixins.py", line 62, in _run
    plugin.run_check()
  File "/usr/local/lib/python3.11/site-packages/health_check/backends.py", line 30, in run_check
    self.check_status()
  File "/usr/local/lib/python3.11/site-packages/health_check/contrib/redis/backends.py", line 38, in check_status
    self.add_error(
  File "/usr/local/lib/python3.11/site-packages/health_check/backends.py", line 49, in add_error
    logger.exception(str(error))
  File "/usr/local/lib/python3.11/logging/__init__.py", line 1524, in exception
    self.error(msg, *args, exc_info=exc_info, **kwargs)
  File "/usr/local/lib/python3.11/logging/__init__.py", line 1518, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.11/logging/__init__.py", line 1634, in _log
    self.handle(record)
  File "/usr/local/lib/python3.11/logging/__init__.py", line 1644, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python3.11/logging/__init__.py", line 1706, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python3.11/logging/__init__.py", line 978, in handle
    self.emit(record)
  File "/usr/local/lib/python3.11/site-packages/keystone_api/apps/logging/handlers.py", line 36, in emit
    ).save()
      ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 822, in save
    self.save_base(
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 909, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1071, in _save_table
    results = self._do_insert(
              ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1112, in _do_insert
    return manager._insert(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1847, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1823, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/sqlite3/base.py", line 329, in execute
    return super().execute(query, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.OperationalError: database table is locked

----------------------------------------------------------------------
Ran 104 tests in 55.105s

FAILED (errors=1)
Destroying test database for alias 'default'...
djperrefort commented 2 weeks ago

Looks like this is answered in the django docs

djperrefort commented 1 week ago

The database timeout was increased significantly to help ensure tests passed in PR #409. This should have resolved the issue. I'm closing the issue for now, but will circle back if the problem persists.