Server process inside Docker crashing every few minutes

nicolinux commented 7 years ago

Using the 9.6.3 tag, I see the following message every few minutes in my Docker logs:

17/07/2017 20:31:13LOG:  server process (PID 540) exited with exit code 255
17/07/2017 20:31:13LOG:  terminating any other active server processes
17/07/2017 20:31:13WARNING:  terminating connection because of crash of another server process
17/07/2017 20:31:13DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
17/07/2017 20:31:13HINT:  In a moment you should be able to reconnect to the database and repeat your command.
17/07/2017 20:31:13LOG:  all server processes terminated; reinitializing
17/07/2017 20:31:13LOG:  database system was interrupted; last known up at 2017-07-17 18:30:55 UTC
17/07/2017 20:31:13LOG:  database system was not properly shut down; automatic recovery in progress
17/07/2017 20:31:13LOG:  invalid record length at 0/4232570: wanted 24, got 0
17/07/2017 20:31:13LOG:  redo is not required
17/07/2017 20:31:14LOG:  MultiXact member wraparound protections are now enabled
17/07/2017 20:31:14LOG:  database system is ready to accept connections
17/07/2017 20:31:14LOG:  autovacuum launcher started

Any idea how I can cancel the rollback? But most importantly - is this a dockerized Postgres issue or just a regular Postgres problem?

tianon commented 7 years ago

This sounds like a resource issue -- does your Docker host/VM have enough RAM for the workload you're putting on it? Is the container running out of file descriptors or some other (in some cases artificially) limited resource?

nicolinux commented 7 years ago

Oh man, this is super weird. I am posting the issue here if someone else stumbles on the same problem. I extended the official Postgres Docker image to start the cron and sshd deamons. Apparently the system which watches the postgres processes thought it was a great idea to kill the sshd daemon and with it one of the postgres processes - wich in turn crashed the Postgres server entirely. Maybe there is some weirdness with the shell going on - I didn't have the time to investigate further. I stopped the sshd daemon and created a new container which acts as an ssh tunnel to the Postgres container.

However, this issue still persists:

17/07/2017 20:31:13LOG:  invalid record length at 0/4232570: wanted 24, got 0

I'd love to know what it means and how I can fix it - but so far I couldn't find any sound info on it.

dimmg commented 7 years ago

facing the same issue! +1

yosifkit commented 6 years ago

Closing old issue.

jeffjanes commented 4 years ago

However, this issue still persists:
17/07/2017 20:31:13LOG:  invalid record length at 0/4232570: wanted 24, got 0

After a crash, the server has to recover by reading and replaying all WAL records. Once it reaches the legitimate end of WAL, it will find some garbage which it can't figure out, and issue a message like this. So this is a totally normal message, if somewhat scary in appearance. So why log it at all? Well, it is possible that your WAL file got corrupted, and so it is not at the legitimate end of WAL. This info could help you figure where the corruption is.

bricky-master commented 3 years ago

Hi. This issu still exists. I'am using dropper as an alternative. Trying to dig deeper but can't figure out why the opens deamon causes this problem.

docker-library / postgres

Server process inside Docker crashing every few minutes #314