Open martinvonwittich opened 8 months ago
It sounds like https://github.com/docker-library/postgres/issues/714 (and thus https://github.com/docker-library/postgres/pull/763) might be related here. :sweat_smile:
("Clean" shutdown is something we've had a number of discussions about in this repository, and something not everyone can agree on a good definition of due to wildly differing use cases and usage patterns. :see_no_evil:)
You probably also want to set a much more generous --stop-timeout
on your PostgreSQL containers -- I usually use 120, but even that might not be enough (as discussed in #714). You'll likely also need to raise systemd
's default TimeoutStopSec
value to make sure your containers get their full stop timeout if the system or service shuts down for any reason, like Docker upgrades.
You probably also want to set a much more generous
--stop-timeout
on your PostgreSQL containers
I'm pretty sure increasing the --stop-timeout
wouldn't help here at all - due to the fact that the main postgres
process isn't even running yet when triggering the issue with docker stop
, the SIGINT apparently goes nowhere useful (the entrypoint script seems to ignore it?). Docker will wait fruitlessly for PostgreSQL to stop, while PostgreSQL doesn't know that it's supposed to stop in the first place. Raising --stop-timeout
will just cause Docker to wait longer before sending SIGKILL.
We've already decided to migrate our PostgreSQL instance out of Docker to ensure we won't get hit by this again.
Ah right -- I guess we could implement some trap
logic in there that tries to mimic "Smart Shutdown" on SIGTERM
and "Fast Shutdown" on SIGINT
, but I think it'd be easier / more likely to be "generally correct" to just assume stopping the container during initialization is a "failed to initialize" situation and exit 1
(more compelling with better PGDATA
handling in that case, as discussed/described in #159).
There's not really something uncontroversially "sane" we could do if we get a request to stop/kill during that initialization process. :thinking:
We are running a PostgreSQL docker container in production for our ERP, and have had multiple instances of corruption in the last few months.
The most recent corruption started with the following message "invalid record length" immediately after a restart:
``` 2024-02-29 12:44:27.403 2024-02-29 11:44:27.402 UTC [7731] FATAL: sorry, too many clients already 2024-02-29 12:44:28.505 2024-02-29 11:44:28.505 UTC [7732] FATAL: sorry, too many clients already 2024-02-29 12:44:28.699 2024-02-29 11:44:28.699 UTC [7733] FATAL: sorry, too many clients already 2024-02-29 12:44:29.912 2024-02-29 11:44:29.912 UTC [7734] FATAL: sorry, too many clients already 2024-02-29 12:44:30.843 2024-02-29 11:44:30.843 UTC [7735] FATAL: sorry, too many clients already 2024-02-29 12:44:33.862 2024-02-29 11:44:33.862 UTC [7740] FATAL: sorry, too many clients already 2024-02-29 12:44:35.004 2024-02-29 11:44:35.003 UTC [7741] FATAL: sorry, too many clients already 2024-02-29 12:44:37.264 2024-02-29 11:44:37.264 UTC [7746] FATAL: sorry, too many clients already 2024-02-29 12:44:37.952 PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-02-29 12:44:37.983 2024-02-29 11:44:37.983 UTC [1] LOG: starting PostgreSQL 15.4 (Debian 15.4-2.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-02-29 12:44:37.983 2024-02-29 11:44:37.983 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-02-29 12:44:37.983 2024-02-29 11:44:37.983 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-02-29 12:44:37.987 2024-02-29 11:44:37.987 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-02-29 12:44:37.996 2024-02-29 11:44:37.995 UTC [29] LOG: database system was interrupted; last known up at 2024-02-29 11:44:08 UTC 2024-02-29 12:44:38.572 2024-02-29 11:44:38.572 UTC [29] LOG: database system was not properly shut down; automatic recovery in progress 2024-02-29 12:44:38.576 2024-02-29 11:44:38.576 UTC [29] LOG: redo starts at 24/68195310 2024-02-29 12:44:38.580 2024-02-29 11:44:38.580 UTC [29] LOG: invalid record length at 24/6834FC90: wanted 24, got 0 2024-02-29 12:44:38.580 2024-02-29 11:44:38.580 UTC [29] LOG: redo done at 24/6834FC68 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2024-02-29 12:44:38.591 2024-02-29 11:44:38.591 UTC [27] LOG: checkpoint starting: end-of-recovery immediate wait 2024-02-29 12:44:38.674 2024-02-29 11:44:38.674 UTC [1] LOG: database system is ready to accept connections 2024-02-29 12:44:40.319 2024-02-29 11:44:40.319 UTC [1] LOG: received fast shutdown request 2024-02-29 12:44:40.321 2024-02-29 11:44:40.321 UTC [1] LOG: aborting any active transactions 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [572] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [575] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [574] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [577] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [562] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [582] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [563] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [576] FATAL: terminating connection due to administrator command 2024-02-29 12:44:40.332 2024-02-29 11:44:40.321 UTC [569] FATAL: terminating connection due to administrator command ``` followed by more serious errors: ``` 2024-02-29 12:45:53.223 2024-02-29 11:45:53.223 UTC [71] ERROR: index "mail_message_pkey" contains unexpected zero page at block 2029 2024-02-29 12:45:53.223 2024-02-29 11:45:53.223 UTC [71] HINT: Please REINDEX it. 2024-02-29 12:45:53.223 2024-02-29 11:45:53.223 UTC [71] STATEMENT: SELECT msg.res_id FROM mail_message msg 2024-02-29 12:45:53.223 RIGHT JOIN mail_notification rel 2024-02-29 12:45:53.223 ON rel.mail_message_id = msg.id AND rel.res_partner_id = 42832 AND (rel.is_read = false OR rel.is_read IS NULL) 2024-02-29 12:45:53.223 WHERE msg.model = 'mail.channel' AND msg.res_id in (1) AND msg.message_type != 'user_notification' 2024-02-29 12:45:53.972 2024-02-29 11:45:53.972 UTC [118] LOG: request to flush past end of generated WAL; request 24/6843DB78, current position 24/683AC100 2024-02-29 12:45:53.972 2024-02-29 11:45:53.972 UTC [118] CONTEXT: writing block 0 of relation base/186211/274960 ``` and finally a PostgreSQL container unable to start due to a corrupted WAL: ``` 2024-02-29 12:48:58.044 PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-02-29 12:48:58.070 2024-02-29 11:48:58.069 UTC [1] LOG: starting PostgreSQL 15.4 (Debian 15.4-2.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-02-29 12:48:58.070 2024-02-29 11:48:58.070 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-02-29 12:48:58.070 2024-02-29 11:48:58.070 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-02-29 12:48:58.074 2024-02-29 11:48:58.074 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-02-29 12:48:58.082 2024-02-29 11:48:58.082 UTC [29] LOG: database system was interrupted while in recovery at 2024-02-29 11:48:51 UTC 2024-02-29 12:48:58.082 2024-02-29 11:48:58.082 UTC [29] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2024-02-29 12:48:58.467 2024-02-29 11:48:58.466 UTC [30] FATAL: the database system is starting up 2024-02-29 12:48:58.687 2024-02-29 11:48:58.687 UTC [29] LOG: database system was not properly shut down; automatic recovery in progress 2024-02-29 12:48:58.691 2024-02-29 11:48:58.691 UTC [29] LOG: redo starts at 24/68351E78 2024-02-29 12:48:58.693 2024-02-29 11:48:58.693 UTC [29] PANIC: invalid lp 2024-02-29 12:48:58.693 2024-02-29 11:48:58.693 UTC [29] CONTEXT: WAL redo at 24/68413FC8 for Heap/HOT_UPDATE: off 7 xmax 18320045 flags 0x60 ; new off 12 xmax 0; blkref #0: rel 1663/186211/189171, blk 49 2024-02-29 12:48:59.169 2024-02-29 11:48:59.169 UTC [31] FATAL: the database system is not yet accepting connections 2024-02-29 12:48:59.169 2024-02-29 11:48:59.169 UTC [31] DETAIL: Consistent recovery state has not been yet reached. 2024-02-29 12:48:59.314 2024-02-29 11:48:59.314 UTC [1] LOG: startup process (PID 29) was terminated by signal 6: Aborted 2024-02-29 12:48:59.314 2024-02-29 11:48:59.314 UTC [1] LOG: terminating any other active server processes 2024-02-29 12:48:59.315 2024-02-29 11:48:59.315 UTC [1] LOG: shutting down due to startup process failure 2024-02-29 12:48:59.334 2024-02-29 11:48:59.334 UTC [1] LOG: database system is shut down ```
We were able to recover from the corruption by using
pg_resetwal
, dumping the database to SQL and restoring into a clean database from the dump, but the multiple corruption incidents got me curious, and I started to investigate if I could reproduce the error.From the initial error it is obvious that PostgreSQL wasn't properly shut down, but probably killed with SIGKILL, because the log doesn't indicate that the instance running at 2024-02-29 12:44:37.264 was shut down before the instance at 2024-02-29 12:44:37.983 started:
To compare it with a clean restart, where the old instance reports that it is being properly shut down:
``` 2024-02-29 03:00:11.265 2024-02-29 02:00:11.264 UTC [51784] FATAL: terminating connection due to administrator command 2024-02-29 03:00:11.268 2024-02-29 02:00:11.268 UTC [51805] FATAL: terminating connection due to administrator command 2024-02-29 03:00:11.276 2024-02-29 02:00:11.276 UTC [1] LOG: background worker "logical replication launcher" (PID 32) exited with exit code 1 2024-02-29 03:00:11.285 2024-02-29 02:00:11.285 UTC [51804] FATAL: terminating connection due to administrator command 2024-02-29 03:00:11.302 2024-02-29 02:00:11.302 UTC [27] LOG: shutting down 2024-02-29 03:00:11.304 2024-02-29 02:00:11.304 UTC [27] LOG: checkpoint starting: shutdown immediate 2024-02-29 03:00:11.363 2024-02-29 02:00:11.363 UTC [1] LOG: database system is shut down 2024-02-29 03:00:19.513 PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-02-29 03:00:19.568 2024-02-29 02:00:19.567 UTC [1] LOG: starting PostgreSQL 15.4 (Debian 15.4-2.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-02-29 03:00:19.568 2024-02-29 02:00:19.568 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-02-29 03:00:19.568 2024-02-29 02:00:19.568 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-02-29 03:00:19.572 2024-02-29 02:00:19.572 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-02-29 03:00:19.584 2024-02-29 02:00:19.580 UTC [29] LOG: database system was shut down at 2024-02-29 02:00:11 UTC 2024-02-29 03:00:19.594 2024-02-29 02:00:19.594 UTC [1] LOG: database system is ready to accept connections ```
At the time where the corruption occurred, a team member issued a command to restart the docker container (we have since figured out that the team restarted PostgreSQL very often due to a misunderstanding that it would be necessary when deploying schema changes), so I went ahead to see if I could reproduce the problem just by restarting the container. I created an empty PostgreSQL container as documented on https://hub.docker.com/_/postgres under "start a postgres instance":
And then I restarted the container in a loop:
``` martin.mein-iserv.de ~ # for i in {1..5}; do date -Iseconds; time docker stop some-postgres >/dev/null; echo; date -Iseconds; time docker start some-postgres >/dev/null; echo; done 2024-03-05T15:44:44+01:00 docker stop some-postgres > /dev/null 0,04s user 0,03s system 22% cpu 0,307 total 2024-03-05T15:44:44+01:00 docker start some-postgres > /dev/null 0,03s user 0,02s system 14% cpu 0,365 total 2024-03-05T15:44:45+01:00 docker stop some-postgres > /dev/null 0,03s user 0,02s system 0% cpu 10,274 total 2024-03-05T15:44:55+01:00 docker start some-postgres > /dev/null 0,03s user 0,03s system 18% cpu 0,291 total 2024-03-05T15:44:55+01:00 docker stop some-postgres > /dev/null 0,03s user 0,02s system 14% cpu 0,329 total 2024-03-05T15:44:56+01:00 docker start some-postgres > /dev/null 0,02s user 0,03s system 17% cpu 0,315 total 2024-03-05T15:44:56+01:00 docker stop some-postgres > /dev/null 0,02s user 0,02s system 0% cpu 10,257 total 2024-03-05T15:45:06+01:00 docker start some-postgres > /dev/null 0,03s user 0,02s system 13% cpu 0,348 total 2024-03-05T15:45:07+01:00 docker stop some-postgres > /dev/null 0,02s user 0,02s system 0% cpu 10,274 total 2024-03-05T15:45:17+01:00 docker start some-postgres > /dev/null 0,04s user 0,03s system 18% cpu 0,354 total ```
It's immediately obvious that something is going wrong here - sometimes
docker stop
hangs for 10 seconds, which is suspiciously similar to Docker's default --stop-timeout:man docker-stop:
``` --stop-timeout Timeout (in seconds) to stop a container, or -1 to disable timeout. The --stop-timeout flag sets the number of seconds to wait for the container to stop after sending the pre-defined (see --stop-signal) system call signal. If the container does not exit after the timeout elapses, it is forcibly killed with a SIGKILL signal. If --stop-timeout is set to -1, no timeout is applied, and the daemon will wait indefinitely for the container to exit. The default is determined by the daemon, and 10 seconds for Linux containers, and 30 seconds for Windows containers. ```
(Sidenote: the issue is very timing-sensitive. Without the
date
call, everydocker stop
after the first one was affected, and doing anything more complicated like callingdocker logs
between adocker start
anddocker stop
makes the issue unreproducible.)The log confirms that e.g. the docker stop run at 2024-03-05T15:44:45+01:00 did not shut down PostgreSQL cleanly:
``` [...] PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-03-05 14:44:45.412 UTC [1] LOG: starting PostgreSQL 16.2 (Debian 16.2-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-03-05 14:44:45.412 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-03-05 14:44:45.412 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-03-05 14:44:45.416 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-03-05 14:44:45.421 UTC [30] LOG: database system was shut down at 2024-03-05 14:44:44 UTC 2024-03-05 14:44:45.429 UTC [1] LOG: database system is ready to accept connections PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-03-05 14:44:55.980 UTC [1] LOG: starting PostgreSQL 16.2 (Debian 16.2-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-03-05 14:44:55.980 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-03-05 14:44:55.980 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-03-05 14:44:55.984 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-03-05 14:44:55.989 UTC [1] LOG: received fast shutdown request 2024-03-05 14:44:55.989 UTC [29] LOG: database system was interrupted; last known up at 2024-03-05 14:44:45 UTC 2024-03-05 14:44:56.048 UTC [29] LOG: database system was not properly shut down; automatic recovery in progress 2024-03-05 14:44:56.049 UTC [29] LOG: invalid record length at 0/15268D8: expected at least 24, got 0 2024-03-05 14:44:56.049 UTC [29] LOG: redo is not required 2024-03-05 14:44:56.052 UTC [27] LOG: checkpoint starting: end-of-recovery immediate wait 2024-03-05 14:44:56.062 UTC [27] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.002 s, total=0.011 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/15268D8, redo lsn=0/15268D8 2024-03-05 14:44:56.065 UTC [27] LOG: shutting down 2024-03-05 14:44:56.067 UTC [27] LOG: checkpoint starting: shutdown immediate 2024-03-05 14:44:56.073 UTC [27] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.008 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; lsn=0/1526950, redo lsn=0/1526950 2024-03-05 14:44:56.080 UTC [1] LOG: database system is shut down [...] ```
Apparently sometimes the SIGTERM that
docker stop
sends initially seems to go unnoticed by the container, leadingdocker stop
to send a SIGKILL 10 seconds later. Digging down withexecsnoop
, it seems that the underlying problem is the fact that thedocker stop
happens during the container startup phase before the actualpostgres
binary is even started:Details
``` 16:31:36 date 968413 953994 0 /usr/bin/date -Iseconds 16:31:36 docker 968414 953994 0 /usr/bin/docker start some-postgres 16:31:36 ifupdown-hotplu 968423 968130 0 16:31:36 ifquery 968426 968425 0 /sbin/ifquery --allow hotplug -l veth1d2dfb4 16:31:36 bridge-network- 968421 968130 0 /lib/udev/bridge-network-interface 16:31:36 bridge-network- 968422 968135 0 /lib/udev/bridge-network-interface 16:31:36 ifupdown-hotplu 968424 968135 0 /lib/udev/ifupdown-hotplug 16:31:36 ifquery 968428 968427 0 /sbin/ifquery --allow hotplug -l veth0fae70f 16:31:36 systemd-sysctl 968429 968130 0 /lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/veth1d2dfb4 --prefix=/net/ipv4/neigh/veth1d2dfb4 --prefix=/net/ipv6/conf/veth1d2dfb4 --prefix=/net/ipv6/neigh/veth1d2dfb4 16:31:36 systemd-sysctl 968430 968135 0 /lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/veth0fae70f --prefix=/net/ipv4/neigh/veth0fae70f --prefix=/net/ipv6/conf/veth0fae70f --prefix=/net/ipv6/neigh/veth0fae70f 16:31:36 containerd-shim 968431 1549 0 16:31:36 containerd-shim 968439 968431 0 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4 -address /run/containerd/containerd.sock 16:31:36 runc 968448 968439 0 /usr/bin/runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4/log.json --log-format json --systemd-cgroup create --bundle /run/containerd/io.containerd.runtime.v2.task/moby/e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4 --pid-file /run/containerd/io.containerd.runtime.v2.task/moby/e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4/init.pid e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4 16:31:36 exe 968457 968448 0 /proc/self/exe init 16:31:36 exe 968466 968448 0 /proc/3052/exe -exec-root=/var/run/docker e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4 beb8ec28ab26 16:31:36 exe 968474 3052 0 /proc/self/exe /var/run/docker/netns/241fd6979fc9 all false 16:31:36 ifupdown-hotplu 968482 968135 0 /lib/udev/ifupdown-hotplug 16:31:36 runc 968483 968439 0 /usr/bin/runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4/log.json --log-format json --systemd-cgroup start e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4 16:31:36 docker-entrypoi 968459 968439 0 /usr/local/bin/docker-entrypoint.sh postgres 16:31:36 bash 968459 968439 0 /usr/bin/bash /usr/local/bin/docker-entrypoint.sh postgres 16:31:36 id 968491 968459 0 /usr/bin/id -u 16:31:36 mkdir 968492 968459 0 /usr/bin/mkdir -p /var/lib/postgresql/data 16:31:36 date 968493 953994 0 /usr/bin/date -Iseconds 16:31:36 chmod 968495 968459 0 /usr/bin/chmod 00700 /var/lib/postgresql/data 16:31:36 docker 968494 953994 0 /usr/bin/docker stop some-postgres 16:31:36 mkdir 968496 968459 0 /usr/bin/mkdir -p /var/run/postgresql 16:31:36 chmod 968497 968459 0 /usr/bin/chmod 03775 /var/run/postgresql 16:31:36 find 968498 968459 0 /usr/bin/find /var/lib/postgresql/data ! -user postgres -exec chown postgres {} + 16:31:36 find 968499 968459 0 /usr/bin/find /var/run/postgresql ! -user postgres -exec chown postgres {} + 16:31:36 id 968505 968459 0 /usr/bin/id -u 16:31:36 gosu 968459 968439 0 /usr/local/bin/gosu postgres /usr/local/bin/docker-entrypoint.sh postgres 16:31:36 docker-entrypoi 968459 968439 0 /usr/local/bin/docker-entrypoint.sh postgres 16:31:36 bash 968459 968439 0 /usr/bin/bash /usr/local/bin/docker-entrypoint.sh postgres 16:31:36 id 968513 968459 0 /usr/bin/id -u 16:31:36 mkdir 968514 968459 0 /usr/bin/mkdir -p /var/lib/postgresql/data 16:31:36 chmod 968515 968459 0 /usr/bin/chmod 00700 /var/lib/postgresql/data 16:31:36 mkdir 968516 968459 0 /usr/bin/mkdir -p /var/run/postgresql 16:31:36 runc 968518 968439 0 /usr/bin/runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4/log.json --log-format json --systemd-cgroup kill e24c90a798805e9fe8c1d2a00017fb966fffd2cbff97d281e7dab595e80020b4 2 16:31:36 chmod 968517 968459 0 /usr/bin/chmod 03775 /var/run/postgresql 16:31:36 id 968519 968459 0 /usr/bin/id -u 16:31:36 cat 968528 968459 0 /usr/bin/cat 16:31:36 postgres 968459 968439 0 /usr/lib/postgresql/16/bin/postgres ```
I was unable to reproduce the issue with a non-containerized PostgreSQL installation on Debian 11 bullseye with a similar
systemctl stop postgresql@13-main.service; systemctl start postgresql@13-main.service
loop, so this issue seems to affect only the Docker version of PostgreSQL.