AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
126 stars 19 forks source link

psycopg2.OperationalError: server closed the connection unexpectedly #3347

Open arkid15r opened 11 months ago

arkid15r commented 11 months ago

Context

During recent e2e tests run the following exception was thrown:

...
Polling SurveyJobs. Currently waiting for job id: 14
E
======================================================================
ERROR: test_all_the_things (tests.foreman.test_end_to_end.FullFlowEndToEndTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/home/user/tests/foreman/test_end_to_end.py", line 235, in test_all_the_things
    self.process_experiments()
  File "/home/user/tests/foreman/test_end_to_end.py", line 279, in process_experiments
    self.assertTrue(wait_for_job(survey_job))
  File "/home/user/tests/foreman/test_end_to_end.py", line 72, in wait_for_job
    job.refresh_from_db()
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/base.py", line 650, in refresh_from_db
    db_instance = db_instance_qs.get()
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 431, in get
    num = len(clone)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 262, in __len__
    self._fetch_all()
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/query.py", line 51, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/usr/local/lib/python3.8/dist-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/dist-packages/raven/contrib/django/client.py", line 127, in execute
    return real_execute(self, sql, params)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/dist-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Problem or idea

The pgbouncer logs contained the following records:

Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: S-0x55e85d65c600: data_refinery_1/drpostgresuser@10.0.154.66:5430 closing because: server lifetime over (age=3600)
Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: dropping database 'data_refinery_1' as it does not exist anymore or inactive auto-database
Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: C-0x55e85d658660: data_refinery_1/drpostgresuser@66.42.67.88:10927 closing because: database removed (age=3600)
Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: C-0x55e85d658660: data_refinery_1/drpostgresuser@66.42.67.88:10927 Pooler Error: database removed
Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: dropping database 'data_refinery_2' as it does not exist anymore or inactive auto-database
Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: C-0x55e85d65af00: data_refinery_2/drpostgresuser@66.42.67.88:42924 closing because: database removed (age=3600)
Aug  3 23:54:22 ip-10-0-80-233 pgbouncer[4253]: C-0x55e85d65af00: data_refinery_2/drpostgresuser@66.42.67.88:42924 Pooler Error: database removed

Solution or next step

FIgure out nature of the exception, possibly play with autodb_idle_timeout and/or server_lifetime pgbouncer configuration params.