building-envelope-data / database

Reference implementation of a product data server as part of the product data network buildingenvelopedata.org
MIT License
0 stars 0 forks source link

Debug abortion of automatic backups #25

Open simon-wacker opened 2 years ago

simon-wacker commented 2 years ago

After running make backup-database multiple times in the directory /app/machine aborting at least once during the process, the database service became unhealthy, exited and attempted various restarts. The logs read

database_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
database_1  | 2022-01-04 16:19:08.898 UTC [1] LOG:  starting PostgreSQL 13.4 (Debian 13.4-4.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
database_1  | 2022-01-04 16:19:08.898 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database_1  | 2022-01-04 16:19:08.898 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database_1  | 2022-01-04 16:19:08.902 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1  | 2022-01-04 16:19:08.907 UTC [25] LOG:  database system was shut down at 2022-01-04 15:57:02 UTC
database_1  | 2022-01-04 16:19:08.907 UTC [25] LOG:  invalid record length at 0/16CC8D8: wanted 24, got 0
database_1  | 2022-01-04 16:19:08.907 UTC [25] LOG:  invalid primary checkpoint record
database_1  | 2022-01-04 16:19:08.907 UTC [25] PANIC:  could not locate a valid checkpoint record
database_1  | 2022-01-04 16:19:09.263 UTC [1] LOG:  startup process (PID 25) was terminated by signal 6: Aborted
database_1  | 2022-01-04 16:19:09.267 UTC [1] LOG:  aborting startup due to startup process failure
database_1  | 2022-01-04 16:19:09.283 UTC [1] LOG:  database system is shut down

and

database_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
database_1  | 2022-01-04 16:33:28.642 UTC [1] LOG:  starting PostgreSQL 13.5 (Debian 13.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
database_1  | 2022-01-04 16:33:28.643 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database_1  | 2022-01-04 16:33:28.643 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database_1  | 2022-01-04 16:33:28.663 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1  | 2022-01-04 16:33:28.670 UTC [25] LOG:  database system was shut down at 2022-01-04 16:32:47 UTC
database_1  | 2022-01-04 16:33:28.744 UTC [1] LOG:  database system is ready to accept connections
database_1  | 2022-01-04 16:33:40.877 UTC [1] LOG:  received fast shutdown request
database_1  | 2022-01-04 16:33:40.879 UTC [1] LOG:  aborting any active transactions
database_1  | 2022-01-04 16:33:40.883 UTC [1] LOG:  background worker "logical replication launcher" (PID 31) exited with exit code 1
database_1  | 2022-01-04 16:33:40.885 UTC [26] LOG:  shutting down
database_1  | 2022-01-04 16:33:40.905 UTC [1] LOG:  database system is shut down
...
database_1  | 2022-01-04 15:53:51.648 UTC [1] LOG:  database system is shut down
database_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
database_1  | 2022-01-04 15:53:52.464 UTC [1] LOG:  starting PostgreSQL 13.4 (Debian 13.4-4.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
database_1  | 2022-01-04 15:53:52.465 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database_1  | 2022-01-04 15:53:52.465 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database_1  | 2022-01-04 15:53:52.468 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1  | 2022-01-04 15:53:52.474 UTC [25] LOG:  database system was shut down at 2022-01-04 15:53:41 UTC
database_1  | 2022-01-04 15:53:52.479 UTC [1] LOG:  database system is ready to accept connections
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  could not open file "postmaster.pid": No such file or directory
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  performing immediate shutdown because data directory lock file is invalid
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  received immediate shutdown request
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  could not open file "postmaster.pid": No such file or directory
database_1  | 2022-01-04 15:56:52.193 UTC [29] WARNING:  terminating connection because of crash of another server process
database_1  | 2022-01-04 15:56:52.193 UTC [29] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
database_1  | 2022-01-04 15:56:52.193 UTC [29] HINT:  In a moment you should be able to reconnect to the database and repeat your command.

How did that happen?

To fix the problem I dropped into a shell in a new database container by running

docker-compose --file docker-compose.production.yml --project-name database_production run database bash

I changed to directory /usr/lib/postgresql/13/bin and I became the user `postgres by running

cd /usr/lib/postgresql/13/bin
su postgres

I then reset the write-ahead log by running

./pg_resetwal /var/lib/postgresql/data/

This may leave the database in an inconsistent state. For details see the documentation of pg_resetwal. However, it makes it possible to start the server again. You can the restore the database from the latest backup.