I have a fairly large database and currently replaying WALs takes over 24 hours.
I'm trying to find ways to speed this up and would appreciate some input.
Up until now, I have found two possible improvements that could be done on the barman-wal-restore script.
The first one is in try_deliver_from_spool where the file is actually copied instead of moved. Assuming we don't run on a copy-on-write file-system and the spool is on the same file-system as pg_wal, it would be faster to move or hard-link the file instead of copying.
As far as I understand it, even when running with multiple parallel processes fetching files, the script will fetch n files, then PostgreSQL will replay those n files and then ask for the next one, where barman-wal-restore will fetch the next batch of n files and the whole thing starts from the beginning. I wonder if it would be possible to continuously fetch files, so the database never has to wait for any files to get delivered.
Just to be clear, I don't expect anybody to implement any of this. I'm collecting ideas which I plan to implement myself.
I have a fairly large database and currently replaying WALs takes over 24 hours.
I'm trying to find ways to speed this up and would appreciate some input.
Up until now, I have found two possible improvements that could be done on the
barman-wal-restore
script.try_deliver_from_spool
where the file is actually copied instead of moved. Assuming we don't run on a copy-on-write file-system and the spool is on the same file-system aspg_wal
, it would be faster to move or hard-link the file instead of copying.barman-wal-restore
will fetch the next batch of n files and the whole thing starts from the beginning. I wonder if it would be possible to continuously fetch files, so the database never has to wait for any files to get delivered.Just to be clear, I don't expect anybody to implement any of this. I'm collecting ideas which I plan to implement myself.