cloudfoundry-community / postgres-boshrelease

A BOSH release for deploying PostgreSQL
MIT License
8 stars 10 forks source link

DB upgrade leads to bosh deploy timeout #17

Closed jsievers closed 7 years ago

jsievers commented 7 years ago

We recently upgraded the postgres release used in concourse as advertised in the concourse 3.5.0 release notes

I did read https://github.com/cloudfoundry/postgres-release/#upgrading and increased the databases.monit_timeout to 300 seconds.

this allowed the DB upgrade to finish without a monit timeout (it took about 2 minutes) according to /var/vcap/sys/log/postgres/postgres_ctl.log , but still bosh deploy failed with

"time":1508245109,"stage":"Updating instance","tags":["db"],"total":1,"task":"db/c58db631-411d-4390-9787-734be1d88eca[98/6448]
ary)","index":1,"state":"failed","progress":100,"data":{"error":"''db/c58db631-411d-4390-9787-734be1d88eca (0)'' is not running
 after update. Review logs for failed jobs: postgres"}}
{"time":1508245109,"error":{"code":400007,"message":"''db/c58db631-411d-4390-9787-734be1d88eca (0)'' is not running after updat
e. Review logs for failed jobs: postgres"}}
', "result_output" = '', "context_id" = '' WHERE ("id" = 11605)
D, [2017-10-17 12:58:29 #10554] [task:11605] DEBUG -- DirectorJobRunner: (0.001595s) COMMIT
I, [2017-10-17 12:58:29 #10554] []  INFO -- DirectorJobRunner: Task took 2 minutes 53.07573839 seconds to process.

According to bosh lifecycle docs, there is another timeout (probably update_watch_time) which is exceeded.

Rather than increasing update_watch_time for all jobs, according to bosh lifecycle docs it seems that a pre_start script would be a better lifecycle to perform long-running tasks like a DB upgrade because it does not timeout on the bosh level.

jsievers commented 7 years ago

wrong repo. reported https://github.com/cloudfoundry/postgres-release/issues/25 instead.