Closed gil0109 closed 4 years ago
Great find
Adding the link to the article you found that may point us in the right direction of a solution;
It appears to be the postgres logger processes that are going defunct;
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1002170+ 1 0.0 0.0 304604 3184 ? Ssl 13:22 0:00 go-crond -v --default-user=1002170000 --allow-unprivileged backup.conf
1002170+ 18 0.0 0.0 11964 1896 ? Ss 13:22 0:00 /bin/sh
1002170+ 747 0.0 0.0 11840 1836 ? Ss+ 13:24 0:00 /bin/sh
1002170+ 920 0.0 0.0 11840 1532 ? Ss+ 13:27 0:00 /bin/sh
1002170+ 935 0.0 0.0 11704 1220 ? S 13:29 0:00 sh -c (./backup.sh -s -v wallet-indy-cat:5432/agent_indy_cat_wallet)
1002170+ 937 0.1 0.0 12108 1596 ? S 13:29 0:00 sh -c (./backup.sh -s -v wallet-indy-cat:5432/agent_indy_cat_wallet)
1002170+ 973 0.0 0.0 11840 1716 ? S 13:29 0:00 /bin/bash /usr/bin/run-postgresql
1002170+ 1478 1.0 0.0 159240 10692 ? S 13:30 0:00 /opt/rh/rh-postgresql10/root/usr/bin/postgres -h
1002170+ 1489 0.0 0.0 116216 1784 ? Ss 13:30 0:00 postgres: logger process
1002170+ 1491 0.0 0.0 159380 3004 ? Ss 13:30 0:00 postgres: checkpointer process
1002170+ 1492 0.0 0.0 159240 2012 ? Ss 13:30 0:00 postgres: writer process
1002170+ 1493 0.0 0.0 159240 2000 ? Ss 13:30 0:00 postgres: wal writer process
1002170+ 1494 0.0 0.0 159664 2888 ? Ss 13:30 0:00 postgres: autovacuum launcher process
1002170+ 1495 0.0 0.0 118332 1884 ? Ss 13:30 0:00 postgres: stats collector process
1002170+ 1496 0.0 0.0 159564 2380 ? Ss 13:30 0:00 postgres: bgworker: logical replication launcher
1002170+ 1504 0.0 0.0 72512 2800 ? S 13:30 0:00 createdb --owner=User_AahE2aJR agent_indy_cat_wallet
1002170+ 1505 4.0 0.0 160156 5404 ? Ss 13:30 0:00 postgres: postgres postgres [local] CREATE DATABASE
1002170+ 1553 0.0 0.0 4376 372 ? S 13:30 0:00 sleep 1
1002170+ 1555 0.0 0.0 51764 1724 ? R+ 13:30 0:00 ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1002170+ 1 0.0 0.0 304604 3184 ? Ssl 13:22 0:00 go-crond -v --default-user=1002170000 --allow-unprivileged backup.conf
1002170+ 18 0.0 0.0 11964 1896 ? Ss 13:22 0:00 /bin/sh
1002170+ 747 0.0 0.0 11840 1836 ? Ss+ 13:24 0:00 /bin/sh
1002170+ 920 0.0 0.0 11840 1532 ? Ss+ 13:27 0:00 /bin/sh
1002170+ 935 0.0 0.0 11704 1220 ? S 13:29 0:00 sh -c (./backup.sh -s -v wallet-indy-cat:5432/agent_indy_cat_wallet)
1002170+ 937 0.1 0.0 12108 1596 ? S 13:29 0:00 sh -c (./backup.sh -s -v wallet-indy-cat:5432/agent_indy_cat_wallet)
1002170+ 973 0.1 0.0 159188 10604 ? S 13:29 0:00 postgres
1002170+ 1478 0.7 0.0 0 0 ? Z 13:30 0:00 [postgres] <defunct>
1002170+ 1489 0.0 0.0 0 0 ? Zs 13:30 0:00 [postgres] <defunct>
1002170+ 1578 0.0 0.0 4376 368 ? S 13:30 0:00 sleep 1
1002170+ 1586 0.0 0.0 116164 1772 ? Ss 13:30 0:00 postgres: logger process
1002170+ 1588 0.0 0.0 159188 1992 ? Ss 13:30 0:00 postgres: checkpointer process
1002170+ 1589 0.0 0.0 159188 1996 ? Ss 13:30 0:00 postgres: writer process
1002170+ 1590 0.0 0.0 159188 1988 ? Ss 13:30 0:00 postgres: wal writer process
1002170+ 1591 0.0 0.0 159612 2896 ? Ss 13:30 0:00 postgres: autovacuum launcher process
1002170+ 1592 0.0 0.0 118416 1984 ? Ss 13:30 0:00 postgres: stats collector process
1002170+ 1593 0.0 0.0 159600 2376 ? Ss 13:30 0:00 postgres: bgworker: logical replication launcher
1002170+ 1594 0.0 0.0 51764 1720 ? R+ 13:30 0:00 ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1002170+ 1 0.0 0.0 304604 3184 ? Ssl 13:22 0:00 go-crond -v --default-user=1002170000 --allow-unprivileged backup.conf
1002170+ 18 0.0 0.0 11964 1896 ? Ss 13:22 0:00 /bin/sh
1002170+ 747 0.0 0.0 11840 1836 ? Ss+ 13:24 0:00 /bin/sh
1002170+ 920 0.0 0.0 11840 1532 ? Ss+ 13:27 0:00 /bin/sh
1002170+ 1478 0.0 0.0 0 0 ? Z 13:30 0:00 [postgres] <defunct>
1002170+ 1489 0.0 0.0 0 0 ? Zs 13:30 0:00 [postgres] <defunct>
1002170+ 1586 0.0 0.0 0 0 ? Zs 13:30 0:00 [postgres] <defunct>
1002170+ 1750 0.0 0.0 51764 1720 ? R+ 13:34 0:00 ps aux
Three defunct processes are created per verification run. Two of those can be eliminated by turning off the logging process logging_collector = off
in the postgres config.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1002170+ 1 0.0 0.0 303196 8020 ? Ssl 14:55 0:00 go-crond -v --default-user=1002170000 --allow-unprivileged backup.conf
1002170+ 17 0.0 0.0 11964 1920 ? Ss+ 14:56 0:00 /bin/sh
1002170+ 597 0.0 0.0 11840 1836 ? Ss 14:57 0:00 /bin/sh
1002170+ 1393 0.0 0.0 0 0 ? Z 15:00 0:00 [postgres] <defunct>
1002170+ 1645 0.0 0.0 51764 1720 ? R+ 15:02 0:00 ps aux
~This leads me to believe the issue is somehow being caused by how stdout and stderr are being redirected by the cron process.~
The real issue is described here; Docker and the PID 1 zombie reaping problem.
Working on a fix.
Starting and shutdown the postgres server within the backup.sh script causes defunct processes. If you start and shutdown the postgres server from inside the pod rsh terminal, this does not occur. If you have the script start the server (and not automatically shutdown the server), and you manually shutdown the server, you also receive the defunct process.