Closed jsievers closed 6 years ago
nova reboot <VM_CID>
/var/vcap/sys/run/postgres/
from postgres_ctl.err.log :
postgres_ctl.err.log
[2018-01-05 12:53:31+0000] + main start [2018-01-05 12:53:31+0000] + local action [2018-01-05 12:53:31+0000] + action=start [2018-01-05 12:53:31+0000] + source /var/vcap/jobs/postgres/bin/pgconfig.sh [2018-01-05 12:53:31+0000] ++ current_version=9.6.4 [2018-01-05 12:53:31+0000] ++ pgversion_current=postgres-9.6.4 [2018-01-05 12:53:31+0000] ++ pgversion_old=postgres-9.6.3 [2018-01-05 12:53:31+0000] ++ pgversion_older=postgres-9.6.2 [2018-01-05 12:53:31+0000] ++ JOB_DIR=/var/vcap/jobs/postgres [2018-01-05 12:53:31+0000] ++ PACKAGE_DIR=/var/vcap/packages/postgres-9.6.4 [2018-01-05 12:53:31+0000] ++ STORE_DIR=/var/vcap/store [2018-01-05 12:53:31+0000] ++ PG_STORE_DIR=/var/vcap/store/postgres [2018-01-05 12:53:31+0000] ++ DATA_DIR=/var/vcap/store/postgres/postgres-9.6.4 [2018-01-05 12:53:31+0000] ++ DATA_DIR_PREVIOUS=/var/vcap/store/postgres/postgres-previous [2018-01-05 12:53:31+0000] ++ DATA_DIR_OLD=/var/vcap/store/postgres/postgres-unknown [2018-01-05 12:53:31+0000] ++ PACKAGE_DIR_OLD=/var/vcap/packages/postgres-unknown [2018-01-05 12:53:31+0000] ++ POSTGRES_UPGRADE_LOCK=/var/vcap/store/postgres/POSTGRES_UPGRADE_LOCK [2018-01-05 12:53:31+0000] ++ pgversion_upgrade_from=postgres-unknown [2018-01-05 12:53:31+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.3 -a -f /var/vcap/store/postgres/postgres-9.6.3/postgresql.conf ']' [2018-01-05 12:53:31+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.2 -a -f /var/vcap/store/postgres/postgres-9.6.2/postgresql.conf ']' [2018-01-05 12:53:31+0000] ++ RUN_DIR=/var/vcap/sys/run/postgres [2018-01-05 12:53:31+0000] ++ LOG_DIR=/var/vcap/sys/log/postgres [2018-01-05 12:53:31+0000] ++ PIDFILE=/var/vcap/sys/run/postgres/postgres.pid [2018-01-05 12:53:31+0000] ++ CONTROL_JOB_PIDFILE=/var/vcap/sys/run/postgres/postgresctl.pid [2018-01-05 12:53:31+0000] ++ HOST=0.0.0.0 [2018-01-05 12:53:31+0000] ++ PORT=5524 [2018-01-05 12:53:31+0000] ++ [[ -n '' ]] [2018-01-05 12:53:31+0000] ++ LD_LIBRARY_PATH=/var/vcap/packages/postgres-9.6.4/lib [2018-01-05 12:53:31+0000] + set +u [2018-01-05 12:53:31+0000] + source /var/vcap/packages/postgres-common/utils.sh [2018-01-05 12:53:31+0000] + set -u [2018-01-05 12:53:31+0000] + case "${action}" in [2018-01-05 12:53:31+0000] + pid_guard /var/vcap/sys/run/postgres/postgresctl.pid 'Postgres control job' false [2018-01-05 12:53:31+0000] + local pidfile=/var/vcap/sys/run/postgres/postgresctl.pid [2018-01-05 12:53:31+0000] + local 'name=Postgres control job' [2018-01-05 12:53:31+0000] + local logall=false [2018-01-05 12:53:31+0000] + '[' false == true ']' [2018-01-05 12:53:31+0000] + '[' -f /var/vcap/sys/run/postgres/postgresctl.pid ']' [2018-01-05 12:53:31+0000] + echo 1315 [2018-01-05 12:53:31+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory [2018-01-05 12:55:11+0000] + main start [2018-01-05 12:55:11+0000] + local action [2018-01-05 12:55:11+0000] + action=start [2018-01-05 12:55:11+0000] + source /var/vcap/jobs/postgres/bin/pgconfig.sh [2018-01-05 12:55:11+0000] ++ current_version=9.6.4 [2018-01-05 12:55:11+0000] ++ pgversion_current=postgres-9.6.4 [2018-01-05 12:55:11+0000] ++ pgversion_old=postgres-9.6.3 [2018-01-05 12:55:11+0000] ++ pgversion_older=postgres-9.6.2 [2018-01-05 12:55:11+0000] ++ JOB_DIR=/var/vcap/jobs/postgres [2018-01-05 12:55:11+0000] ++ PACKAGE_DIR=/var/vcap/packages/postgres-9.6.4 [2018-01-05 12:55:11+0000] ++ STORE_DIR=/var/vcap/store [2018-01-05 12:55:11+0000] ++ PG_STORE_DIR=/var/vcap/store/postgres [2018-01-05 12:55:11+0000] ++ DATA_DIR=/var/vcap/store/postgres/postgres-9.6.4 [2018-01-05 12:55:11+0000] ++ DATA_DIR_PREVIOUS=/var/vcap/store/postgres/postgres-previous [2018-01-05 12:55:11+0000] ++ DATA_DIR_OLD=/var/vcap/store/postgres/postgres-unknown [2018-01-05 12:55:11+0000] ++ PACKAGE_DIR_OLD=/var/vcap/packages/postgres-unknown [2018-01-05 12:55:11+0000] ++ POSTGRES_UPGRADE_LOCK=/var/vcap/store/postgres/POSTGRES_UPGRADE_LOCK [2018-01-05 12:55:11+0000] ++ pgversion_upgrade_from=postgres-unknown [2018-01-05 12:55:11+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.3 -a -f /var/vcap/store/postgres/postgres-9.6.3/postgresql.conf ']' [2018-01-05 12:55:11+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.2 -a -f /var/vcap/store/postgres/postgres-9.6.2/postgresql.conf ']' [2018-01-05 12:55:11+0000] ++ RUN_DIR=/var/vcap/sys/run/postgres [2018-01-05 12:55:11+0000] ++ LOG_DIR=/var/vcap/sys/log/postgres [2018-01-05 12:55:11+0000] ++ PIDFILE=/var/vcap/sys/run/postgres/postgres.pid [2018-01-05 12:55:11+0000] ++ CONTROL_JOB_PIDFILE=/var/vcap/sys/run/postgres/postgresctl.pid [2018-01-05 12:55:11+0000] ++ HOST=0.0.0.0 [2018-01-05 12:55:11+0000] ++ PORT=5524 [2018-01-05 12:55:11+0000] ++ [[ -n '' ]] [2018-01-05 12:55:11+0000] ++ LD_LIBRARY_PATH=/var/vcap/packages/postgres-9.6.4/lib [2018-01-05 12:55:11+0000] + set +u [2018-01-05 12:55:11+0000] + source /var/vcap/packages/postgres-common/utils.sh [2018-01-05 12:55:11+0000] + set -u [2018-01-05 12:55:11+0000] + case "${action}" in [2018-01-05 12:55:11+0000] + pid_guard /var/vcap/sys/run/postgres/postgresctl.pid 'Postgres control job' false [2018-01-05 12:55:11+0000] + local pidfile=/var/vcap/sys/run/postgres/postgresctl.pid [2018-01-05 12:55:11+0000] + local 'name=Postgres control job' [2018-01-05 12:55:11+0000] + local logall=false [2018-01-05 12:55:11+0000] + '[' false == true ']' [2018-01-05 12:55:11+0000] + '[' -f /var/vcap/sys/run/postgres/postgresctl.pid ']' [2018-01-05 12:55:11+0000] + echo 1473 [2018-01-05 12:55:11+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory
the important line from log above is
[2018-01-05 12:53:31+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory
postgres is able to survive a VM reboot.
bosh ssh into the postgres VM and manually execute
mkdir -p /var/vcap/sys/run/postgres/ chown -R vcap:vcap /var/vcap/sys/run/ monit start postgres
there are similar bugs already fixed in other bosh releases, see e.g. https://github.com/cloudfoundry/capi-release/issues/25
I think the problem is that the control scripts rely on prestart to create $RUN_DIR:
$RUN_DIR
https://github.com/cloudfoundry/postgres-release/blob/6e31bafab3f03fe0e0adf286f044168f0b3f3d95/jobs/postgres/templates/pre-start.sh.erb#L49-L50
and in case of external VM reboot (i.e. outside of any bosh operation), prestart is not executed. /var/vcap/data/sys/run/ is a temp file system
/var/vcap/data/sys/run/
$ mount | grep sys/run tmpfs on /var/vcap/data/sys/run type tmpfs (rw,size=1m)
and thus is empty after VM reboot (which is fine).
although there is an mkdir -p $RUN_DIR in https://github.com/cloudfoundry/postgres-release/blob/6e31bafab3f03fe0e0adf286f044168f0b3f3d95/jobs/postgres/templates/postgres_ctl.sh.erb#L42
mkdir -p $RUN_DIR
, this is too late in case of VM reboot because of this line executed earlier:
https://github.com/cloudfoundry/postgres-release/blob/6e31bafab3f03fe0e0adf286f044168f0b3f3d95/jobs/postgres/templates/postgres_ctl.sh.erb#L18
which assumes the directory $RUN_DIR already exists.
Initialization of $RUN_DIR should probably happen as the very first thing when start) is invoked, see the fix for a similar problem in cloud_controller_ng
In our scenario (we run CCDB on postgres-release) this bug has caused an extended Cloudfoundry downtime triggered by forced VM reboots which were required to deploy urgent security fixes on the IaaS level.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/154085953
The labels on this github issue will be updated when the story is started.
@jsievers Thanks for reporting this issue. The fix in included in v24.
Steps to reproduce
nova reboot <VM_CID>
if on openstack)Observed behaviour
/var/vcap/sys/run/postgres/
does not exist.from
postgres_ctl.err.log
:the important line from log above is
Expected behaviour
postgres is able to survive a VM reboot.
Workaround
bosh ssh into the postgres VM and manually execute
Possible fix
there are similar bugs already fixed in other bosh releases, see e.g. https://github.com/cloudfoundry/capi-release/issues/25
I think the problem is that the control scripts rely on prestart to create
$RUN_DIR
:https://github.com/cloudfoundry/postgres-release/blob/6e31bafab3f03fe0e0adf286f044168f0b3f3d95/jobs/postgres/templates/pre-start.sh.erb#L49-L50
and in case of external VM reboot (i.e. outside of any bosh operation), prestart is not executed.
/var/vcap/data/sys/run/
is a temp file systemand thus is empty after VM reboot (which is fine).
although there is an
mkdir -p $RUN_DIR
in https://github.com/cloudfoundry/postgres-release/blob/6e31bafab3f03fe0e0adf286f044168f0b3f3d95/jobs/postgres/templates/postgres_ctl.sh.erb#L42, this is too late in case of VM reboot because of this line executed earlier:
https://github.com/cloudfoundry/postgres-release/blob/6e31bafab3f03fe0e0adf286f044168f0b3f3d95/jobs/postgres/templates/postgres_ctl.sh.erb#L18
which assumes the directory
$RUN_DIR
already exists.Initialization of
$RUN_DIR
should probably happen as the very first thing when start) is invoked, see the fix for a similar problem in cloud_controller_ngIn our scenario (we run CCDB on postgres-release) this bug has caused an extended Cloudfoundry downtime triggered by forced VM reboots which were required to deploy urgent security fixes on the IaaS level.