EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.07k stars 191 forks source link

When recovering with `--no-get-wal` and `--target-time`, copy all WAL files #951

Closed barthisrael closed 3 months ago

barthisrael commented 3 months ago

Previous to this commit Barman would attempt to guess the required WAL files using the filesystem creation timestamp of them.

However, that is not a reliable approach. For example, if there is streaming replication lag, the WAL files will be created in the Barman host later when compared to Postgres.

That can be even worse in the case of archive_command, because it waits for the WAL files to be filled up before making them available for archiving.

In those cases the recovery could end up failing because of missing COMMIT or ABORT records in the WAL files that were copied by Barman, i.e. Postgres would fail to perform recovery because it wouldn't know if it satistified or not the requested recovery_target_time.

From now on, if the user requests a recovery with --no-get-wal and --target-time, Barman will simply copy all WAL files up to the timeline being recovered, guaranteeing that way that Postgres will be able to find COMMIT or ABORT records, if they exist in the archived WAL files, making it possible to complete the recovery.

Note: we evaluated other implementations to avoid possibly copying a lot of unused WAL files. For example, with pg_waldump we would be able to look up for COMMIT and ABORT records in a way similar to what Postgres does. However, that could put a lot of overhead wherever that would be processed (during WAL archiving or backup recovery), so that option was discarded.

References: BAR-189 #881.

gcalacoci commented 3 months ago

@barthisrael the patch itself looks good, do we want to amend the doc here (in which case, is missing from the pr) or in a different pr?

barthisrael commented 3 months ago

@barthisrael the patch itself looks good, do we want to amend the doc here (in which case, is missing from the pr) or in a different pr?

@gcalacoci Martin will take care of that through BAR-191.