Closed natefoo closed 2 years ago
No automated test but I did the following locally:
backup.py --backup /tmp/pgbackup
rsync -av /tmp/pgbackup/20211108T212616Z/ /var/lib/postgresql/13/main
restore_command = 'cp "/tmp/pgbackup/wal_archive/%f" "%p"'
touch /var/lib/postgresql/13/main/recovery.signal
It recovered successfully and the data was intact.
That's a pretty comprehensive list of steps, could probs do that in gh actions.
True. It'd be nice to test generating a WAL during the backup, or at least generating one after the backup, and making sure that gets included.
The method that was previously in use has been deprecated since PostgreSQL 9.6 (although still supported as of 14). But we were also incorrectly storing our archived WAL segments inside the backups. I am not sure if it would be possible to recover from the old backups since initiating the backup usually causes the archival of one segment to fail as the
current
directory is rotated. The new archive script stores these in a separatewal_archive
directory. The backup script now excludes a bunch of stuff that the documentation says to exclude.Unfortunately, the backup script is now written in Python. I really didn't want to do this, but because the start and stop SQL have to be performed in the same database connection, and the backup must be performed in between, it meant I'd have to do a lot of process and filehandle control in shell that I wanted to do even less than writing this in Python. The only non-standard-lib dependency is psycopg2, which can be installed from the PGDG repos on EL and is part of the standard Galaxy Ansible setup anyway since you need it for the postgresql_objects role. The backup script is compatible with Python 2.7 (the default on EL 7) or 3.5+. As such I haven't provided any functionality to create a virtualenv or pip install psycopg2 or anything like that. This script should run using the system Python as long as you install the system psycopg2. If you need to use a different Python you can override
postgresql_backup_command
to do so.I could find no documentation saying that you should make backups of the current active WAL segment, so that script and cron job have been dropped. I don't know where I got that idea from, but it's not in the documentation as far as I can tell.
You can now prune old backups - specify the number to keep with
postgresql_backup_keep
(default 30), there is no time-based option. Remove--keep
from the command line by overridingpostgresql_backup_command
to disable.The WAL archive will also be pruned so that only WAL segments newer than the oldest backup will be kept, but this functionality only works if the backup dir is a mounted filesystem. Because the rest of the backup functionality depends on rsync and the pruning is done with the standard
pg_archivecleanup
PostgreSQL utility, I didn't want to add separate options for specifying ssh connection params and then running the utility (which may not be installed on the remote) over SSH. If someone wants to add that feature it'd be welcome. You can also just copybackup.py
to somewhere with the archive mounted and run with just--clean-archive
(without--backup
and--keep
), but again,pg_archivecleanup
must be available.We should probably actually test restoring a backup to make sure this is all correct...
PITR Documentation for reference.