b4mad / racing

Community-driven SimRacing data collection and analysis
https://b4mad.racing
GNU General Public License v3.0
22 stars 13 forks source link

pgsql is unavailable #432

Closed goern closed 1 year ago

goern commented 1 year ago

Describe the bug https://paddock.b4mad.racing/ delivers a HTTP/500, log says:

 self._fetch_all()
File "/opt/app-root/lib64/python3.10/site-packages/django/db/models/query.py", line 1881, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/opt/app-root/lib64/python3.10/site-packages/django/db/models/query.py", line 91, in __iter__
results = compiler.execute_sql(
File "/opt/app-root/lib64/python3.10/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
cursor = self.connection.cursor()
File "/opt/app-root/lib64/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/opt/app-root/lib64/python3.10/site-packages/django/db/backends/base/base.py", line 330, in cursor
return self._cursor()
File "/opt/app-root/lib64/python3.10/site-packages/django/db/backends/base/base.py", line 306, in _cursor
self.ensure_connection()
File "/opt/app-root/lib64/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/opt/app-root/lib64/python3.10/site-packages/django/db/backends/base/base.py", line 288, in ensure_connection
with self.wrap_database_errors:
File "/opt/app-root/lib64/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/opt/app-root/lib64/python3.10/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
self.connect()
File "/opt/app-root/lib64/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/opt/app-root/lib64/python3.10/site-packages/django/db/backends/base/base.py", line 270, in connect
self.connection = self.get_new_connection(conn_params)
File "/opt/app-root/lib64/python3.10/site-packages/django_prometheus/db/backends/postgresql/base.py", line 9, in get_new_connection
conn = super().get_new_connection(*args, **kwargs)
File "/opt/app-root/lib64/python3.10/site-packages/django_prometheus/db/common.py", line 45, in get_new_connection
return super().get_new_connection(*args, **kwargs)
File "/opt/app-root/lib64/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/opt/app-root/lib64/python3.10/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection
connection = self.Database.connect(**conn_params)
File "/opt/app-root/lib64/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: connection to server at "db-primary.b4mad-racing.svc" (172.30.33.72), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?

looking at the db-instance pod:

2023-09-07 11:05:42.614 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:42.723 P00 DEBUG: common/io/http/request::httpRequestProcess: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:42.827 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:42.930 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:43.133 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:43.436 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:43.939 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:44.742 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:46.045 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known
2023-09-07 11:05:46,122 INFO: Lock owner: db-instance-hnkw-0; I am db-instance-hnkw-0
2023-09-07 11:05:46,128 WARNING: manual failover: members list is empty
2023-09-07 11:05:46,128 INFO: updated leader lock during doing crash recovery in a single user mode
2023-09-07 11:05:48.150 P00 DEBUG: common/io/socket/client::sckClientOpen: retry HostConnectError: unable to get address for 'b4mad-racing-psql.192.169.178.22': [-2] Name or service not known

https://github.com/pgbackrest/pgbackrest/issues/1778 might be related?!

To Reproduce Steps to reproduce the behavior:

  1. Go to https://paddock.b4mad.racing/, see 500
  2. read https://console-openshift-console.apps.phobos.b4mad.emea.operate-first.cloud/k8s/ns/b4mad-racing/pods/paddock-377-r66vk/logs
  3. read https://console-openshift-console.apps.phobos.b4mad.emea.operate-first.cloud/k8s/ns/b4mad-racing/pods/db-instance-hnkw-0/logs

Expected behavior HTTP/200

Screenshots n/a

Additional context n/a

/priority critical-urgent /assign durandom

goern commented 1 year ago

PV holding pgsql data is 100% used, https://console-openshift-console.apps.phobos.b4mad.emea.operate-first.cloud/k8s/ns/b4mad-racing/persistentvolumeclaims/db-instance-hnkw-pgdata

related https://github.com/b4mad/racing/issues/245

durandom commented 1 year ago

yes, same issue. I'll post the steps to fix this in #245