Closed gustabowill closed 2 months ago
Just rebased on top of master
.
This works as expected for backups:
$ barman list-backup all
pg17 20240726T231411 'incremental_2' - Fri Jul 26 21:14:13 2024 - Size: 35.4 MiB - WAL Size: 0 B
pg17 20240726T231359 'incremental_1' - Fri Jul 26 21:14:01 2024 - Size: 35.4 MiB - WAL Size: 32.0 MiB
pg17 20240726T231348 'full' - Fri Jul 26 21:13:51 2024 - Size: 251.3 MiB - WAL Size: 48.0 MiB
data_checksums
configuration:$ find ~/barman_home/ -name backup.info -print -exec bash -c "grep -i checksum {}; echo" \;
/home/vagrant/barman_home/pg17/base/20240726T231348/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231359/backup.info
data_checksums=off
/home/vagrant/barman_home/pg17/base/20240726T231411/backup.info
data_checksums=on
WARNING
at the end of a new barman backup
execution:$ barman backup pg17 --name incremental_3 --incremental incremental_2
Starting backup using postgres method for server pg17 in /home/vagrant/barman_home/pg17/base/20240726T231540
Backup start at LSN: 6/4000060 (000000010000000600000004, 00000060)
Starting backup copy via pg_basebackup for 20240726T231540
Copy done (time: 1 second)
Finalising the backup.
Backup size: 3.4 MiB
Backup end at LSN: 6/6000000 (000000010000000600000006, 00000000)
Backup completed (start time: 2024-07-26 23:15:40.967057, elapsed time: 1 second)
Processing xlog segments from streaming for pg17
000000010000000600000004
000000010000000600000005
WARNING: This is an incremental backup taken with `data_checksums = on` whereas some previous backups in the chain were taken with `data_checksums = off`. This can lead to potential recovery issues. Consider taking a new full backup to avoid having inconsistent backup chains.
WARNING: IMPORTANT: this backup is classified as WAITING_FOR_WALS, meaning that Barman has not received yet all the required WAL files for the backup consistency.
This is a common behaviour in concurrent backup scenarios, and Barman automatically set the backup as DONE once all the required WAL files have been archived.
Hint: execute the backup command with '--wait'
WARNING
can be found in the logs:$ tail -4 /var/log/barman/barman.log
2024-07-26 23:15:42,905 [33686] barman.server WARNING: This is an incremental backup taken with `data_checksums = on` whereas some previous backups in the chain were taken with `data_checksums = off`. This can lead to potential recovery issues. Consider taking a new full backup to avoid having inconsistent backup chains.
2024-07-26 23:15:42,906 [33686] barman.server WARNING: IMPORTANT: this backup is classified as WAITING_FOR_WALS, meaning that Barman has not received yet all the required WAL files for the backup consistency.
This is a common behaviour in concurrent backup scenarios, and Barman automatically set the backup as DONE once all the required WAL files have been archived.
Hint: execute the backup command with '--wait'
This also works as expected when recovering -- a WARNING
is issued:
$ barman list-backup all
pg17 20240726T231540 'incremental_3' - Fri Jul 26 21:15:42 2024 - Size: 35.4 MiB - WAL Size: 0 B
pg17 20240726T231411 'incremental_2' - Fri Jul 26 21:14:13 2024 - Size: 35.4 MiB - WAL Size: 48.0 MiB
pg17 20240726T231359 'incremental_1' - Fri Jul 26 21:14:01 2024 - Size: 35.4 MiB - WAL Size: 32.0 MiB
pg17 20240726T231348 'full' - Fri Jul 26 21:13:51 2024 - Size: 251.3 MiB - WAL Size: 48.0 MiB
$ find ~/barman_home/ -name backup.info -print -exec bash -c "grep -i checksum {}; echo" \;
/home/vagrant/barman_home/pg17/base/20240726T231348/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231359/backup.info
data_checksums=off
/home/vagrant/barman_home/pg17/base/20240726T231411/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231540/backup.info
data_checksums=on
barman recover
attempt:$ barman recover pg17 incremental_3 ~/data_restore --recovery-staging-path /tmp
Start combining backup via pg_combinebackup for backup 20240726T231540 on /tmp
End combining backup via pg_combinebackup for backup 20240726T231540
Starting local restore for server pg17 using backup 20240726T231540
Destination directory: /home/vagrant/data_restore
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
WARNING: You recovered from an incremental backup where checksums were enabled on that backup, but not all backups in the chain. It is advised to disable, and optionally re-enable, checksums on the destination directory to avoid failures.
Recovery completed (start time: 2024-07-26 23:21:14.038575+02:00, elapsed time: less than one second)
Your PostgreSQL server has been successfully prepared for recovery!
$ tail -1 /var/log/barman/barman.log
2024-07-26 23:21:14,581 [34700] barman.recovery_executor WARNING: You recovered from an incremental backup where checksums were enabled on that backup, but not all backups in the chain. It is advised to disable, and optionally re-enable, checksums on the destination directory to avoid failures.
The previous tests were about cases where a WARNING
was expected for backup and recovery commands, i.e. when the last backup has checksums enabled, but some backup in the chain does not.
I also tested the other cases, where there are divergences in the chain (mixes of on
and off
, but the latest backup is off
), thus it's not expected that we get a WARNING
. It went fine:
barman backup
:# Get list of backups
$ barman list-backup all
pg17 20240726T231540 'incremental_3' - Fri Jul 26 21:15:42 2024 - Size: 35.4 MiB - WAL Size: 0 B
pg17 20240726T231411 'incremental_2' - Fri Jul 26 21:14:13 2024 - Size: 35.4 MiB - WAL Size: 48.0 MiB
pg17 20240726T231359 'incremental_1' - Fri Jul 26 21:14:01 2024 - Size: 35.4 MiB - WAL Size: 32.0 MiB
pg17 20240726T231348 'full' - Fri Jul 26 21:13:51 2024 - Size: 251.3 MiB - WAL Size: 48.0 MiB
# Check their checksum state
$ find ~/barman_home/ -name backup.info -print -exec bash -c "grep -i checksum {}; echo" \;
/home/vagrant/barman_home/pg17/base/20240726T231348/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231359/backup.info
data_checksums=off
/home/vagrant/barman_home/pg17/base/20240726T231411/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231540/backup.info
data_checksums=on
# Data checkums are disabled
$ psql -h localhost -U postgres -c "SHOW data_checksums"
data_checksums
----------------
off
(1 row)
# Take a new backup -- no WARNING about checkums, as expected
$ barman backup pg17 --name incremental_4 --incremental incremental_3
Starting backup using postgres method for server pg17 in /home/vagrant/barman_home/pg17/base/20240726T233256
Backup start at LSN: 6/7000110 (000000010000000600000007, 00000110)
Starting backup copy via pg_basebackup for 20240726T233256
Copy done (time: 1 second)
Finalising the backup.
Backup size: 3.4 MiB
Backup end at LSN: 6/9000000 (000000010000000600000009, 00000000)
Backup completed (start time: 2024-07-26 23:32:56.187060, elapsed time: 1 second)
Processing xlog segments from streaming for pg17
000000010000000600000007
000000010000000600000008
WARNING: IMPORTANT: this backup is classified as WAITING_FOR_WALS, meaning that Barman has not received yet all the required WAL files for the backup consistency.
This is a common behaviour in concurrent backup scenarios, and Barman automatically set the backup as DONE once all the required WAL files have been archived.
Hint: execute the backup command with '--wait'
barman recover
:# List of backups
$ barman list-backup all
pg17 20240726T233256 'incremental_4' - Fri Jul 26 21:32:57 2024 - Size: 35.4 MiB - WAL Size: 0 B
pg17 20240726T231540 'incremental_3' - Fri Jul 26 21:15:42 2024 - Size: 35.4 MiB - WAL Size: 48.0 MiB
pg17 20240726T231411 'incremental_2' - Fri Jul 26 21:14:13 2024 - Size: 35.4 MiB - WAL Size: 48.0 MiB
pg17 20240726T231359 'incremental_1' - Fri Jul 26 21:14:01 2024 - Size: 35.4 MiB - WAL Size: 32.0 MiB
pg17 20240726T231348 'full' - Fri Jul 26 21:13:51 2024 - Size: 251.3 MiB - WAL Size: 48.0 MiB
# Their checksum state
$ find ~/barman_home/ -name backup.info -print -exec bash -c "grep -i checksum {}; echo" \;
/home/vagrant/barman_home/pg17/base/20240726T231348/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231359/backup.info
data_checksums=off
/home/vagrant/barman_home/pg17/base/20240726T231411/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T231540/backup.info
data_checksums=on
/home/vagrant/barman_home/pg17/base/20240726T233256/backup.info
data_checksums=off
# Attempt to recover -- no WARNING, as expected
$ barman recover pg17 incremental_4 ~/data_restore --recovery-staging-path /tmp
Start combining backup via pg_combinebackup for backup 20240726T233256 on /tmp
End combining backup via pg_combinebackup for backup 20240726T233256
Starting local restore for server pg17 using backup 20240726T233256
Destination directory: /home/vagrant/data_restore
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
Recovery completed (start time: 2024-07-26 23:35:05.869706+02:00, elapsed time: less than one second)
Your PostgreSQL server has been successfully prepared for recovery!
@barthisrael I just applied your suggestions. I also left a thought in one of your replies about the output of show-backup
command.
barman show-backup
is also working, both in console
and json
writers:
# Console writer
$ barman -f console show-backup pg17 incremental_4
Backup 20240726T233256:
Backup Name : incremental_4
Server Name : pg17
System Id : 7382227563826631291
Status : DONE
PostgreSQL Version : 170000
PGDATA directory : /var/lib/pgsql/17/data
Checksums : off
Tablespaces:
custom_ts_1 : /var/lib/pgsql/17/tablespaces/custom_ts_1 (oid: 24678)
custom_ts_2 : /var/lib/pgsql/17/tablespaces/custom_ts_2 (oid: 24686)
Base backup information:
Disk usage : 3.4 MiB (35.4 MiB with WALs)
Incremental size : 3.4 MiB (-0.00%)
Timeline : 1
Begin WAL : 000000010000000600000008
End WAL : 000000010000000600000009
WAL number : 2
Begin time : 2024-07-26 21:32:56.183376+00:00
End time : 2024-07-26 21:32:57.553398+00:00
Copy time : 1 second
Estimated throughput : 2.5 MiB/s
Begin Offset : 40
End Offset : 0
Begin LSN : 6/8000028
End LSN : 6/9000000
WAL information:
No of files : 0
Disk usage : 0 B
WAL rate : 195.18/hour
Last available : 000000010000000600000009
Catalog information:
Retention Policy : VALID
Previous Backup : 20240726T231540
Next Backup : - (this is the latest base backup)
# json writer
$ barman -f json show-backup pg17 incremental_4 | jq
{
"pg17": {
"backup_id": "20240726T233256",
"backup_name": "incremental_4",
"base_backup_information": {
"analysis_time": "less than one second",
"analysis_time_seconds": 0,
"begin_lsn": "6/8000028",
"begin_offset": 40,
"begin_time": "2024-07-26 21:32:56.183376+00:00",
"begin_time_timestamp": "1722029576",
"begin_wal": "000000010000000600000008",
"copy_time": "1 second",
"copy_time_seconds": 1.350093,
"disk_usage": "3.4 MiB",
"disk_usage_bytes": 3528561,
"disk_usage_with_wals": "35.4 MiB",
"disk_usage_with_wals_bytes": 37082993,
"end_lsn": "6/9000000",
"end_offset": 0,
"end_time": "2024-07-26 21:32:57.553398+00:00",
"end_time_timestamp": "1722029577",
"end_wal": "000000010000000600000009",
"incremental_size": "3.4 MiB",
"incremental_size_bytes": 3528561,
"incremental_size_ratio": "-0.00%",
"number_of_workers": 1,
"throughput": "2.5 MiB/s",
"throughput_bytes": 2613568.8430352574,
"timeline": 1
},
"catalog_information": {
"next_backup": "- (this is the latest base backup)",
"previous_backup": "20240726T231540",
"retention_policy": "VALID"
},
"data_checksums": "off",
"pgdata_directory": "/var/lib/pgsql/17/data",
"postgresql_version": 170000,
"status": "DONE",
"tablespaces": [
{
"location": "/var/lib/pgsql/17/tablespaces/custom_ts_1",
"name": "custom_ts_1",
"oid": 24678
},
{
"location": "/var/lib/pgsql/17/tablespaces/custom_ts_2",
"name": "custom_ts_2",
"oid": 24686
}
],
"wal_information": {
"compression_ratio": 0,
"disk_usage": "0 B",
"disk_usage_bytes": 0,
"last_available": "000000010000000600000009",
"no_of_files": 0,
"timelines": [],
"wal_rate": "195.18/hour",
"wal_rate_per_second": 0.05421688016258301
}
}
}
@barthisrael I just applied your suggestions. I also left a thought in one of your replies about the output of
show-backup
command.
I like your suggestion! Let's bring that up to the team and get their thoughts.
Postgres incremental backups have the risk of the user switching the
data_checkums
configuration on the server between backups, which could potentially cause problems during recovery. An attempt of reducing these cases is implemented in this PR, where we will now store the state ofdata_checksums
in thebackup.info
file of each backup, which will allow Barman to check the consistency of the chain during backup and recovery execution and emit warnings to the user so they can take action about it.data_checksums
config to thebackup.info
fileis_checksum_consistent
in theLocalBackupInfo
class to check for the chain consistency. This check is performed on backup and recovery executions.is_checksum_consistent
created in the previous commit