Closed Qqwy closed 1 year ago
The reason for configuring PGUSER/PGPASSWORD is of course to enable wal-g
for backups. (c.f. #71 and this forum topic )
@Qqwy can you try this workaround for now? When setting PGUSER and PGPASSWORD for wal-g
set PGUSER to repluser
and PGPASSWORD to the result of running echo $REPL_PASSWORD
on one of your database machines.
Thank you. This workaround seems to work 👍 .
Site note: It does mean that for the time being, calling wal-g
requires manually overriding these values:
Like e.g. when manually performing a backup:
PGUSER=postgres PGPASSWORD=$OPERATOR_PASSWORD wal-g backup-push /data/postgres
I have not tested yet whether the full backup+restore flow with wal-g
works OK with the workaround in place. Hopefully will have time to do so tomorrow.
I finally had the time to revisit this. Unfortunately, another error occurs during the initialization process of a restore later on.
The basis of a restore is a second cluster where OPERATOR_PASSWORD
, SU_PASSWORD
and REPL_PASSWORD
are set to the same values as in the original cluster and also setting FLY_RESTORED_FROM
to ensure stolonctl will apply these settings on cluster restart (otherwise you'll get authentication issues),
the following is what you should runto trigger the restore as per stolon's docs
export $(cat /data/.env | xargs)
stolonctl init '{ "initMode": "pitr", "pitrConfig": {"dataRestoreCommand": "wal-g backup-fetch %d LATEST" , "archiveRecoverySettings": { "restoreCommand": "wal-g wal-fetch \"%f\" \"%p\"" } } }'
But this will not succeed. Instead, the keeper will error with:
FATAL: could not start WAL streaming: ERROR: replication slot "stolon_1723880d" is active for PID 649
Actually, above issue was caused by something else: You need to make sure you do not enable wal-g
on the restore until after you've restored. Makes sense in hindsight.
I'll close this issue now 👌 .
fly pg create
fly secrets set --app=db-app-name PGUSER="postgres" PGPASSWORD="passwordhere"
with the password shown in the first step.fly machines list
to note the first machine's ID and thenfly machines clone machineid
.The problematic state is now in effect. The new machine will not become healthy (however, at some point the
fly machines clone
command will terminate regardless).fly logs --app=db-app-name
will show the following:This error message will repeat every few seconds.
And obviously, the new machine is not part of the new cluster as can be seen using
stolonctl status
: