cloudfoundry / postgres-release

BOSH release for PostgreSQL
Apache License 2.0
14 stars 36 forks source link

BOSH says postgres not running after update; no helpful logs #27

Closed shalako closed 6 years ago

shalako commented 6 years ago

Attempting to test that Routing API can communicate with postgresql over TLS. We have tested that without configuring databases.tls.certificate and databases.tls.private_key, the deployment succeeds. However, upon configuring these properties, BOSH/monit believes postgres fails to start.

Task 5458 | 19:32:02 | Updating instance singleton-database: singleton-database/f63ca082-5799-4b34-a9e3-58bd16c1cea0 (0) (canary) (00:11:02)
                     L Error: 'singleton-database/f63ca082-5799-4b34-a9e3-58bd16c1cea0 (0)' is not running after update. Review logs for failed jobs: postgres
Task 5458 | 19:43:05 | Error: 'singleton-database/f63ca082-5799-4b34-a9e3-58bd16c1cea0 (0)' is not running after update. Review logs for failed jobs: postgres
Process 'postgres'                  not monitored

The postgres logs don't have any errors in them: https://gist.github.com/shalako/7cd886afdb6ac9f8924e60f253553b78

We looked for a manifest property to increase the log level but couldn't find one. We ended up modifying /var/vcap/jobs/postgres/config/postgresql.conf by adding the following line and using monit to restart

log_min_messages = 'DEBUG5'

The logs didn't seems to change at all

We don't know how to troubleshoot the problem.

cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/152824413

The labels on this github issue will be updated when the story is started.

charleshansen commented 6 years ago

For what it is worth, we eventually found the logs in /var/vcap/store/postgres/postgres-9.6.4/pg_log/startup.log

2017-11-13 21:57:18.953 GMT: FATAL: could not load root certificate file "/var/vcap/jobs/postgres/config/certificates/server.ca_cert": no SSL error reported. 

That file exists but is empty.

In the deployment, we provided databases.tls.certificate and databases.tls.private_key, but we did not specify databases.tls.ca because we did not need mTLS.

Adding a databases.tls.ca fixed the problem, but should not have been necessary. It looks like the release creates an empty server.ca_cert file and postgres won't start if this file is not a valid certificate.

It would be great to get some of that logging from /var/vcap/store/postgres/postgres-9.6.4/pg_log/startup.loginto something like /var/vcap/sys/log/postgres/startup.log.

shalako commented 6 years ago

Operators should only ever look for logs in /var/vcap/sys/log/

Is it intentional to require mutual authentication, or would you consider accepting one-way TLS? If the latter, please make databases.tls.ca optional.

valeriap commented 6 years ago

Fixed in v22.