deis / postgres

A PostgreSQL database used by Deis Workflow.
https://deis.com
MIT License
36 stars 22 forks source link

Backup restore loop: "Could not get a connection to the database:" #200

Open miracle2k opened 7 years ago

miracle2k commented 7 years ago

This is the same behaviour as in: https://github.com/deis/postgres/issues/123 The database container never comes up completely to pass a health check. It tries to restore from S3, and then fails.

The database image is quay.io/deis/postgres:v2.5.4 on AWS (I previously had the same problem on Azure).

.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.wal_e.blobstore.s3.s3_util WARNING  MSG: WALE_S3_ENDPOINT defined, ignoring AWS_REGION
        HINT: AWS_REGION is only intended for use with AWS S3, and not interface-compatible use cases supported by WALE_S3_ENDPOINT
        STRUCTURED: time=2017-08-01T17:23:59.161533-00 pid=2454
FATAL:  the database system is starting up
.wal_e.operator.backup INFO     MSG: promoted prefetched wal segment
        STRUCTURED: time=2017-08-01T17:24:00.068839-00 pid=2454 action=wal-fetch key=s3://k3-database/wal_005/0000000A0000001C000000DF.lzo prefix= seg=0000000A0000001C000000DF
FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
. stopped waiting
server is still starting up
Performing an initial backup...
LOG:  restored log file "0000000A0000001C000000DF" from archive
wal_e.main   INFO     MSG: starting WAL-E
        DETAIL: The subcommand is "backup-push".
        STRUCTURED: time=2017-08-01T17:25:02.573642-00 pid=2534
wal_e.blobstore.s3.s3_util WARNING  MSG: WALE_S3_ENDPOINT defined, ignoring AWS_REGION
        HINT: AWS_REGION is only intended for use with AWS S3, and not interface-compatible use cases supported by WALE_S3_ENDPOINT
        STRUCTURED: time=2017-08-01T17:25:14.049587-00 pid=2534
wal_e.blobstore.s3.s3_util WARNING  MSG: WALE_S3_ENDPOINT defined, ignoring AWS_REGION
        HINT: AWS_REGION is only intended for use with AWS S3, and not interface-compatible use cases supported by WALE_S3_ENDPOINT
        STRUCTURED: time=2017-08-01T17:25:19.008224-00 pid=2536
FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
wal_e.main   ERROR    MSG: could not run one or more external programs WAL-E depends upon
        DETAIL: Could not get a connection to the database: note that superuser access
        is required
        Could not run the following programs, are they installed? psql
        STRUCTURED: time=2017-08-01T17:25:20.904420-00 pid=2534
wal_e.operator.backup INFO     MSG: promoted prefetched wal segment
        STRUCTURED: time=2017-08-01T17:25:22.171869-00 pid=2536 action=wal-fetch key=s3://k3-database/wal_005/0000000A0000001C000000E0.lzo prefix= seg=0000000A0000001C000000E0
LOG:  restored log file "0000000A0000001C000000E0" from archive

Seems to me based on those last lines is that it's still downloading backup files, and there is some sort of timeout inside postgres triggering. This happens after about half an hour.

It also doesn't seem right that a restore takes so long, correct?