Closed PaganMuffin closed 3 months ago
I've got the same problem using PostgreSQL 16.1 and Barman 3.9.0.
Recovery using barman --config /path/to/barman.conf backup --wait localhost works fine and restores PostgreSQL to the latest state, but adding --target-time="2024-03-15 13:50:56.844481+00:00" lead to a missing WAL file.
The issue still persists in Barman 3.10. Here are logs and more details.
Started the recovery using this command:
/path/to/barman --config /path/to/barman.conf recover localhost 20240410T165613 /path/to/database --target-time "2024-04-10 14:56:15.915033+00:00"
Starting local restore for server localhost using backup 20240410T165613
Destination directory: /path/to/database
Doing PITR. Recovery target time: '2024-04-10 14:56:15.915033+00:00'
Using safe horizon time for smart rsync copy: 2024-04-10 14:56:13.756299+00:00
Copying the base backup.
Copying required WAL segments.
Generating recovery configuration
Identify dangerous settings in destination directory.
IMPORTANT
These settings have been modified to prevent data losses
postgresql.conf line 262: archive_command = false
postgresql.auto.conf line 7: recovery_target_time = None
Recovery completed (start time: 2024-04-10 17:20:21.768695+02:00, elapsed time: less than one second)
Your PostgreSQL server has been successfully prepared for recovery!
Started the PostgreSQL server, but it failed to start.
Log file of barman:
2024-04-10 17:20:21,736 [2504447] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-10 17:20:21,764 [2504447] barman.wal_archiver INFO: No xlog segments found from streaming for localhost.
2024-04-10 17:20:21,770 [2504447] barman.recovery_executor INFO: Starting local restore for server localhost using backup 20240410T165613
2024-04-10 17:20:21,771 [2504447] barman.recovery_executor INFO: Destination directory: /path/to/database
2024-04-10 17:20:21,772 [2504447] barman.recovery_executor INFO: Doing PITR. Recovery target time: '2024-04-10 14:56:15.915033+00:00'
2024-04-10 17:20:21,787 [2504447] barman.recovery_executor INFO: Using safe horizon time for smart rsync copy: 2024-04-10 14:56:13.756299+00:00
2024-04-10 17:20:21,796 [2504447] barman.recovery_executor INFO: Copying the base backup.
2024-04-10 17:20:21,799 [2504447] barman.copy_controller INFO: Copy started (safe before datetime.datetime(2024, 4, 10, 14, 56, 13, 756299, tzinfo=tzutc()))
2024-04-10 17:20:21,799 [2504447] barman.copy_controller INFO: Copy step 1 of 4: [global] analyze PGDATA directory: /path/to/database-backup/localhost/base/20240410T165613/data/
2024-04-10 17:20:21,991 [2504447] barman.copy_controller INFO: Copy step 2 of 4: [global] create destination directories and delete unknown files for PGDATA directory: /path/to/database-backup/localhost/base/20240410T165613/data/
2024-04-10 17:20:22,108 [2504462] barman.copy_controller INFO: Copy step 3 of 4: [bucket 0] starting copy safe files from PGDATA directory: /path/to/database-backup/localhost/base/20240410T165613/data/
2024-04-10 17:20:22,207 [2504462] barman.copy_controller INFO: Copy step 3 of 4: [bucket 0] finished (duration: less than one second) copy safe files from PGDATA directory: /path/to/database-backup/localhost/base/20240410T165613/data/
2024-04-10 17:20:22,213 [2504462] barman.copy_controller INFO: Copy step 4 of 4: [bucket 0] starting copy files with checksum from PGDATA directory: /path/to/database-backup/localhost/base/20240410T165613/data/
2024-04-10 17:20:22,260 [2504462] barman.copy_controller INFO: Copy step 4 of 4: [bucket 0] finished (duration: less than one second) copy files with checksum from PGDATA directory: /path/to/database-backup/localhost/base/20240410T165613/data/
2024-04-10 17:20:22,261 [2504447] barman.copy_controller INFO: Copy finished (safe before 2024-04-10 14:56:13.756299+00:00)
2024-04-10 17:20:22,273 [2504447] barman.recovery_executor INFO: Copying required WAL segments.
2024-04-10 17:20:22,274 [2504447] barman.recovery_executor INFO: Starting copy of 2 WAL files 2/2 from WalFileInfo(compression='gzip', name='00000001000000000000000D', size=16465, time=1712760976.317646) to WalFileInfo(compression='gzip', name='00000001000000000000000E', size=16396, time=1712760977.4366577)
2024-04-10 17:20:22,481 [2504447] barman.recovery_executor INFO: Finished copying 2 WAL files.
2024-04-10 17:20:22,482 [2504447] barman.recovery_executor INFO: Generating recovery configuration
2024-04-10 17:20:22,492 [2504447] barman.recovery_executor INFO: Identify dangerous settings in destination directory.
Log file of PostgreSQL:
2024-04-10 15:20:44.521 UTC [2504481] LOG: starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit
2024-04-10 15:20:44.521 UTC [2504481] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-04-10 15:20:44.521 UTC [2504481] LOG: listening on IPv6 address "::", port 5432
2024-04-10 15:20:44.542 UTC [2504481] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-04-10 15:20:44.548 UTC [2504485] LOG: database system was interrupted; last known up at 2024-04-10 14:56:13 UTC
2024-04-10 15:20:44.548 UTC [2504485] LOG: creating missing WAL directory "pg_wal/archive_status"
2024-04-10 15:20:44.699 UTC [2504486] FATAL: the database system is starting up
2024-04-10 15:20:44.703 UTC [2504487] FATAL: the database system is starting up
2024-04-10 15:20:44.706 UTC [2504488] FATAL: the database system is starting up
cp: der Aufruf von stat für 'barman_wal/00000002.history' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2024-04-10 15:20:44.817 UTC [2504485] LOG: starting point-in-time recovery to 2024-04-10 14:56:15.915033+00
2024-04-10 15:20:44.834 UTC [2504485] LOG: restored log file "00000001000000000000000D" from archive
2024-04-10 15:20:44.965 UTC [2504485] LOG: redo starts at 0/D000028
2024-04-10 15:20:44.983 UTC [2504485] LOG: restored log file "00000001000000000000000E" from archive
cp: der Aufruf von stat für 'barman_wal/00000001000000000000000F' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2024-04-10 15:20:45.089 UTC [2504485] LOG: consistent recovery state reached at 0/D000100
2024-04-10 15:20:45.089 UTC [2504481] LOG: database system is ready to accept read-only connections
cp: der Aufruf von stat für 'barman_wal/00000001000000000000000F' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2024-04-10 15:20:45.095 UTC [2504485] LOG: redo done at 0/E000060 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.13 s
2024-04-10 15:20:45.095 UTC [2504485] FATAL: recovery ended before configured recovery target was reached
2024-04-10 15:20:45.096 UTC [2504481] LOG: startup process (PID 2504485) exited with exit code 1
2024-04-10 15:20:45.097 UTC [2504481] LOG: terminating any other active server processes
2024-04-10 15:20:45.097 UTC [2504481] LOG: shutting down due to startup process failure
2024-04-10 15:20:45.099 UTC [2504481] LOG: database system is shut down
Hi, Could you please provide the output of the barman diagnose
immediately before executing a P.I.T.R. command that fails like the above?
Also, the logs like you did before.
Thanks a lot!
Here are the requested information. I don't have a full test environment available, so I just created a blank database, backup it several times and tried to restore it using "--target-time", just to show the error.
[username@xusername-PG1 database-backup]$ /opt/username/programs/barman/bin/barman --config /opt/username/programs/barman/etc/barman.conf list-backups localhost
localhost 20240418T185600 - Thu Apr 18 16:56:02 2024 - Size: 29.4 MiB - WAL Size: 16.2 KiB
localhost 20240418T185551 - Thu Apr 18 16:55:55 2024 - Size: 29.4 MiB - WAL Size: 32.1 KiB
localhost 20240418T000005 - Wed Apr 17 22:00:26 2024 - Size: 29.4 MiB - WAL Size: 32.3 KiB
[username@xusername-PG1 database-backup]$ barman --config /opt/username/programs/barman/etc/barman.conf show-backup localhost 20240418T185600
Backup 20240418T185600:
Server Name : localhost
System Id : 7356240957958843836
Status : DONE
PostgreSQL Version : 160001
PGDATA directory : /opt/username/database
Base backup information:
Disk usage : 29.4 MiB (29.4 MiB with WALs)
Incremental size : 8.9 KiB (-99.97%)
Timeline : 1
Begin WAL : 000000010000000000000010
End WAL : 000000010000000000000010
WAL number : 1
WAL compression ratio: 99.90%
Begin time : 2024-04-18 16:56:00.224345+00:00
End time : 2024-04-18 16:56:02.161839+00:00
Copy time : less than one second
Estimated throughput : 9.2 KiB/s
Begin Offset : 40
End Offset : 256
Begin LSN : 0/10000028
End LSN : 0/10000100
WAL information:
No of files : 1
Disk usage : 16.2 KiB
WAL rate : 22.00/hour
Compression ratio : 99.90%
Last available : 000000010000000000000011
Catalog information:
Retention Policy : VALID
Previous Backup : 20240418T185551
Next Backup : - (this is the latest base backup)
https://medium.com/@kiwiv/recover-your-database-with-barman-be5ee4e11939
If we want to use this backup, note the begin and end time and you can recover from the any time in between. For this example will pick up the end time.
I picked the end date of the backup.
[username@xusername-PG1 database-backup]$ barman --config /opt/username/programs/barman/etc/barman.conf recover localhost 20240418T185600 /opt/username/database --target-time "2024-04-18 16:56:02.161839+00:00"
Starting local restore for server localhost using backup 20240418T185600
Destination directory: /opt/username/database
Doing PITR. Recovery target time: '2024-04-18 16:56:02.161839+00:00'
Copying the base backup.
Copying required WAL segments.
Generating recovery configuration
Identify dangerous settings in destination directory.
IMPORTANT
These settings have been modified to prevent data losses
postgresql.conf line 262: archive_command = false
Recovery completed (start time: 2024-04-18 19:12:55.447251+02:00, elapsed time: 2 seconds)
Your PostgreSQL server has been successfully prepared for recovery!
Log of PostgreSQL:
2024-04-18 17:13:59.443 UTC [796275] LOG: starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit
2024-04-18 17:13:59.444 UTC [796275] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-04-18 17:13:59.444 UTC [796275] LOG: listening on IPv6 address "::", port 5432
2024-04-18 17:13:59.449 UTC [796275] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-04-18 17:13:59.454 UTC [796279] LOG: database system was interrupted; last known up at 2024-04-18 16:56:00 UTC
2024-04-18 17:13:59.454 UTC [796279] LOG: creating missing WAL directory "pg_wal/archive_status"
2024-04-18 17:13:59.897 UTC [796281] FATAL: the database system is starting up
2024-04-18 17:13:59.899 UTC [796282] FATAL: the database system is starting up
2024-04-18 17:13:59.901 UTC [796283] FATAL: the database system is starting up
cp: der Aufruf von stat für 'barman_wal/00000002.history' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2024-04-18 17:14:01.363 UTC [796279] LOG: starting point-in-time recovery to 2024-04-18 16:56:02.161839+00
2024-04-18 17:14:01.381 UTC [796279] LOG: restored log file "000000010000000000000010" from archive
2024-04-18 17:14:01.904 UTC [796300] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:01.904 UTC [796300] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:01.907 UTC [796301] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:01.907 UTC [796301] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:01.910 UTC [796302] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:01.910 UTC [796302] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:02.233 UTC [796305] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:02.233 UTC [796305] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:02.235 UTC [796306] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:02.235 UTC [796306] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:02.241 UTC [796308] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:02.241 UTC [796308] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:02.243 UTC [796309] FATAL: the database system is not yet accepting connections
2024-04-18 17:14:02.243 UTC [796309] DETAIL: Consistent recovery state has not been yet reached.
2024-04-18 17:14:02.661 UTC [796279] LOG: redo starts at 0/10000028
2024-04-18 17:14:02.680 UTC [796279] LOG: restored log file "000000010000000000000011" from archive
cp: der Aufruf von stat für 'barman_wal/000000010000000000000012' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2024-04-18 17:14:02.810 UTC [796279] LOG: consistent recovery state reached at 0/10000100
2024-04-18 17:14:02.810 UTC [796275] LOG: database system is ready to accept read-only connections
cp: der Aufruf von stat für 'barman_wal/000000010000000000000012' ist nicht möglich: Datei oder Verzeichnis nicht gefunden
2024-04-18 17:14:02.814 UTC [796279] LOG: redo done at 0/110001B0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 1.30 s
2024-04-18 17:14:02.814 UTC [796279] FATAL: recovery ended before configured recovery target was reached
2024-04-18 17:14:02.815 UTC [796275] LOG: startup process (PID 796279) exited with exit code 1
2024-04-18 17:14:02.815 UTC [796275] LOG: terminating any other active server processes
2024-04-18 17:14:02.816 UTC [796275] LOG: shutting down due to startup process failure
2024-04-18 17:14:02.818 UTC [796275] LOG: database system is shut down
barman diagnose
{
"global": {
"config": {
"backup_method": "rsync",
"backup_options": "concurrent_backup",
"barman_home": "/opt/username/database-backup",
"barman_user": "username",
"compression": "gzip",
"configuration_files_directory": "/opt/username/programs/barman/etc",
"errors_list": [],
"log_file": "/opt/username/programs/barman/logs/barman.log",
"path_prefix": "/opt/username/programs/postgres/bin",
"reuse_backup": "link",
"slot_name": "barman",
"streaming_archiver": "on"
},
"system_info": {
"barman_ver": "3.10.0",
"kernel_ver": "Linux xusername-PG1 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux",
"python_ver": "Python 3.9.18",
"release": "RedHat Linux Rocky Linux release 9.3 (Blue Onyx)",
"rsync_ver": "rsync version 3.2.3 protocol version 31",
"ssh_ver": "",
"timestamp": "2024-04-18T18:59:10.292506+02:00"
}
},
"models": {},
"servers": {
"localhost": {
"active_model": null,
"backups": {
"20240418T000005": {
"backup_id": "20240418T000005",
"backup_label": "'START WAL LOCATION: 0/C000060 (file 00000001000000000000000C)\\nCHECKPOINT LOCATION: 0/C000098\\nBACKUP METHOD: streamed\\nBACKUP FROM: primary\\nSTART TIME: 2024-04-17 22:00:11 UTC\\nLABEL: Barman backup localhost 20240418T000005\\nSTART TIMELINE: 1\\n'",
"begin_offset": 96,
"begin_time": "2024-04-18T00:00:05.707294+02:00",
"begin_wal": "00000001000000000000000C",
"begin_xlog": "0/C000060",
"compression": null,
"config_file": "/opt/username/database/postgresql.conf",
"copy_stats": {
"analysis_time": 0.965946,
"analysis_time_per_item": {
"pgdata": 0.965946
},
"copy_time": 12.807903,
"copy_time_per_item": {
"pg_control": 0.295303,
"pgdata": 12.509192
},
"number_of_workers": 1,
"serialized_copy_time": 12.799614,
"serialized_copy_time_per_item": {
"pg_control": 0.295303,
"pgdata": 12.504311
},
"total_time": 14.197323
},
"deduplicated_size": 30794643,
"end_offset": 368,
"end_time": "2024-04-18T00:00:26.479322+02:00",
"end_wal": "00000001000000000000000C",
"end_xlog": "0/C000170",
"error": null,
"hba_file": "/opt/username/database/pg_hba.conf",
"ident_file": "/opt/username/database/pg_ident.conf",
"included_files": [
"/opt/username/database/postgresql.auto.conf"
],
"mode": "rsync-concurrent",
"pgdata": "/opt/username/database",
"server_name": "localhost",
"size": 30794643,
"status": "DONE",
"systemid": "7356240957958843836",
"tablespaces": null,
"timeline": 1,
"version": 160001,
"xlog_segment_size": 16777216
},
"20240418T185551": {
"backup_id": "20240418T185551",
"backup_label": "'START WAL LOCATION: 0/E000028 (file 00000001000000000000000E)\\nCHECKPOINT LOCATION: 0/E000098\\nBACKUP METHOD: streamed\\nBACKUP FROM: primary\\nSTART TIME: 2024-04-18 16:55:53 UTC\\nLABEL: Barman backup localhost 20240418T185551\\nSTART TIMELINE: 1\\n'",
"begin_offset": 40,
"begin_time": "2024-04-18T18:55:51.775903+02:00",
"begin_wal": "00000001000000000000000E",
"begin_xlog": "0/E000028",
"compression": null,
"config_file": "/opt/username/database/postgresql.conf",
"copy_stats": {
"analysis_time": 0.792977,
"analysis_time_per_item": {
"pgdata": 0.792977
},
"copy_time": 0.973467,
"copy_time_per_item": {
"pg_control": 0.295436,
"pgdata": 0.674826
},
"number_of_workers": 1,
"serialized_copy_time": 0.963715,
"serialized_copy_time_per_item": {
"pg_control": 0.295436,
"pgdata": 0.668279
},
"total_time": 1.785295
},
"deduplicated_size": 9140,
"end_offset": 312,
"end_time": "2024-04-18T18:55:55.285195+02:00",
"end_wal": "00000001000000000000000E",
"end_xlog": "0/E000138",
"error": null,
"hba_file": "/opt/username/database/pg_hba.conf",
"ident_file": "/opt/username/database/pg_ident.conf",
"included_files": [
"/opt/username/database/postgresql.auto.conf"
],
"mode": "rsync-concurrent",
"pgdata": "/opt/username/database",
"server_name": "localhost",
"size": 30794643,
"status": "DONE",
"systemid": "7356240957958843836",
"tablespaces": null,
"timeline": 1,
"version": 160001,
"xlog_segment_size": 16777216
},
"20240418T185600": {
"backup_id": "20240418T185600",
"backup_label": "'START WAL LOCATION: 0/10000028 (file 000000010000000000000010)\\nCHECKPOINT LOCATION: 0/10000060\\nBACKUP METHOD: streamed\\nBACKUP FROM: primary\\nSTART TIME: 2024-04-18 16:56:00 UTC\\nLABEL: Barman backup localhost 20240418T185600\\nSTART TIMELINE: 1\\n'",
"begin_offset": 40,
"begin_time": "2024-04-18T18:56:00.224345+02:00",
"begin_wal": "000000010000000000000010",
"begin_xlog": "0/10000028",
"compression": null,
"config_file": "/opt/username/database/postgresql.conf",
"copy_stats": {
"analysis_time": 0.764417,
"analysis_time_per_item": {
"pgdata": 0.764417
},
"copy_time": 0.970696,
"copy_time_per_item": {
"pg_control": 0.292661,
"pgdata": 0.676094
},
"number_of_workers": 1,
"serialized_copy_time": 0.961986,
"serialized_copy_time_per_item": {
"pg_control": 0.292661,
"pgdata": 0.669325
},
"total_time": 1.756391
},
"deduplicated_size": 9099,
"end_offset": 256,
"end_time": "2024-04-18T18:56:02.161839+02:00",
"end_wal": "000000010000000000000010",
"end_xlog": "0/10000100",
"error": null,
"hba_file": "/opt/username/database/pg_hba.conf",
"ident_file": "/opt/username/database/pg_ident.conf",
"included_files": [
"/opt/username/database/postgresql.auto.conf"
],
"mode": "rsync-concurrent",
"pgdata": "/opt/username/database",
"server_name": "localhost",
"size": 30794646,
"status": "DONE",
"systemid": "7356240957958843836",
"tablespaces": null,
"timeline": 1,
"version": 160001,
"xlog_segment_size": 16777216
}
},
"config": {
"active": true,
"archiver": false,
"archiver_batch_size": 0,
"autogenerate_manifest": false,
"aws_profile": null,
"aws_region": null,
"azure_credential": null,
"azure_resource_group": null,
"azure_subscription_id": null,
"backup_compression": null,
"backup_compression_format": null,
"backup_compression_level": null,
"backup_compression_location": null,
"backup_compression_workers": null,
"backup_directory": "/opt/username/database-backup/localhost",
"backup_method": "rsync",
"backup_options": "concurrent_backup",
"bandwidth_limit": null,
"barman_home": "/opt/username/database-backup",
"barman_lock_directory": "/opt/username/database-backup",
"basebackup_retry_sleep": 30,
"basebackup_retry_times": 0,
"basebackups_directory": "/opt/username/database-backup/localhost/base",
"check_timeout": 30,
"cluster": "localhost",
"compression": "gzip",
"config_changes_queue": "/opt/username/database-backup/cfg_changes.queue",
"conninfo": "host=localhost port=5432 dbname=postgres user=barman",
"create_slot": "manual",
"custom_compression_filter": null,
"custom_compression_magic": null,
"custom_decompression_filter": null,
"description": null,
"disabled": false,
"errors_directory": "/opt/username/database-backup/localhost/errors",
"forward_config_path": false,
"gcp_project": null,
"gcp_zone": null,
"immediate_checkpoint": false,
"incoming_wals_directory": "/opt/username/database-backup/localhost/incoming",
"last_backup_maximum_age": null,
"last_backup_minimum_size": null,
"last_wal_maximum_age": null,
"lock_directory_cleanup": true,
"max_incoming_wals_queue": null,
"minimum_redundancy": 0,
"msg_list": [],
"name": "localhost",
"network_compression": false,
"parallel_jobs": 1,
"parallel_jobs_start_batch_period": 1,
"parallel_jobs_start_batch_size": 10,
"path_prefix": "/opt/username/programs/postgres/bin",
"post_archive_retry_script": null,
"post_archive_script": null,
"post_backup_retry_script": null,
"post_backup_script": null,
"post_delete_retry_script": null,
"post_delete_script": null,
"post_recovery_retry_script": null,
"post_recovery_script": null,
"post_wal_delete_retry_script": null,
"post_wal_delete_script": null,
"pre_archive_retry_script": null,
"pre_archive_script": null,
"pre_backup_retry_script": null,
"pre_backup_script": null,
"pre_delete_retry_script": null,
"pre_delete_script": null,
"pre_recovery_retry_script": null,
"pre_recovery_script": null,
"pre_wal_delete_retry_script": null,
"pre_wal_delete_script": null,
"primary_checkpoint_timeout": 0,
"primary_conninfo": null,
"primary_ssh_command": null,
"recovery_options": "",
"recovery_staging_path": null,
"retention_policy": "window 7 d",
"retention_policy_mode": "auto",
"reuse_backup": "link",
"slot_name": "barman",
"snapshot_disks": null,
"snapshot_gcp_project": null,
"snapshot_instance": null,
"snapshot_provider": null,
"snapshot_zone": null,
"ssh_command": "ssh username@localhost",
"streaming_archiver": true,
"streaming_archiver_batch_size": 0,
"streaming_archiver_name": "barman_receive_wal",
"streaming_backup_name": "barman_streaming_backup",
"streaming_conninfo": "host=localhost port=5432 dbname=postgres user=barman",
"streaming_wals_directory": "/opt/username/database-backup/localhost/streaming",
"tablespace_bandwidth_limit": null,
"wal_conninfo": null,
"wal_retention_policy": "simple-wal 7 d",
"wal_streaming_conninfo": null,
"wals_directory": "/opt/username/database-backup/localhost/wals"
},
"status": {
"archive_timeout": 0,
"checkpoint_timeout": 300,
"config_file": "/opt/username/database/postgresql.conf",
"connection_error": null,
"current_lsn": "0/110000C8",
"current_size": 31254984.0,
"current_xlog": "000000010000000000000011",
"data_checksums": "off",
"data_directory": "/opt/username/database",
"has_backup_privileges": true,
"has_monitoring_privileges": true,
"hba_file": "/opt/username/database/pg_hba.conf",
"hot_standby": "on",
"ident_file": "/opt/username/database/pg_ident.conf",
"included_files": [
"/opt/username/database/postgresql.auto.conf"
],
"is_in_recovery": false,
"is_superuser": true,
"max_replication_slots": "10",
"max_wal_senders": "10",
"pg_receivexlog_compatible": true,
"pg_receivexlog_installed": true,
"pg_receivexlog_path": "/opt/username/programs/postgres/bin/pg_receivewal",
"pg_receivexlog_supports_slots": true,
"pg_receivexlog_synchronous": false,
"pg_receivexlog_version": "16.1",
"postgres_systemid": "7356240957958843836",
"replication_slot": [
"barman",
true,
"0/11000000"
],
"replication_slot_support": true,
"server_txt_version": "16.1",
"streaming": true,
"streaming_supported": true,
"streaming_systemid": "7356240957958843836",
"synchronous_standby_names": [
""
],
"timeline": 1,
"version_supported": true,
"wal_compression": "off",
"wal_keep_size": "0",
"wal_level": "logical",
"xlog_segment_size": 16777216,
"xlogpos": "0/110000C8"
},
"system_info": {
"kernel_ver": "Linux xusername-PG1 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux",
"python_ver": "Python 3.9.18",
"release": "RedHat Linux Rocky Linux release 9.3 (Blue Onyx)",
"rsync_ver": "rsync version 3.2.3 protocol version 31",
"ssh_ver": ""
},
"wals": {
"last_archived_wal_per_timeline": {
"00000001": {
"compression": "gzip",
"name": "000000010000000000000010",
"size": 16460,
"time": 1713459362.313908
}
}
}
}
}
}
Logfile of barman:
2024-04-18 19:12:02,287 [796162] barman.postgres WARNING: Error retrieving PostgreSQL status: connection to server at "localhost" (::1), port 5432 failed: Connection refused
2024-04-18 19:12:02,287 [796162] barman.postgres WARNING: Error retrieving PostgreSQL status: connection to server at "localhost" (::1), port 5432 failed: Connection refused
2024-04-18 19:12:02,293 [796162] barman.backup_executor WARNING: Error retrieving PostgreSQL status: connection to server at "localhost" (::1), port 5432 failed: Connection refused
2024-04-18 19:12:02,293 [796162] barman.server ERROR: Check 'no access to monitoring functions' failed for server 'localhost'
2024-04-18 19:12:02,293 [796162] barman.server ERROR: Check 'PostgreSQL streaming (WAL streaming)' failed for server 'localhost'
2024-04-18 19:12:02,307 [796162] barman.server ERROR: Impossible to start WAL streaming. Check the log for more details, or run 'barman check localhost'
2024-04-18 19:12:55,423 [796166] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-18 19:12:55,444 [796166] barman.wal_archiver INFO: No xlog segments found from streaming for localhost.
2024-04-18 19:12:55,469 [796166] barman.recovery_executor INFO: Starting local restore for server localhost using backup 20240418T185600
2024-04-18 19:12:55,470 [796166] barman.recovery_executor INFO: Destination directory: /opt/username/database
2024-04-18 19:12:55,471 [796166] barman.recovery_executor INFO: Doing PITR. Recovery target time: '2024-04-18 16:56:02.161839+00:00'
2024-04-18 19:12:55,476 [796166] barman.recovery_executor WARNING: Unable to retrieve safe horizon time for smart rsync copy: The /opt/username/database/.barman-recover.info file does not exist
2024-04-18 19:12:55,483 [796166] barman.recovery_executor INFO: Copying the base backup.
2024-04-18 19:12:55,485 [796166] barman.copy_controller INFO: Copy started (safe before None)
2024-04-18 19:12:55,485 [796166] barman.copy_controller INFO: Copy step 1 of 4: [global] analyze PGDATA directory: /opt/username/database-backup/localhost/base/20240418T185600/data/
2024-04-18 19:12:55,650 [796166] barman.copy_controller INFO: Copy step 2 of 4: [global] create destination directories and delete unknown files for PGDATA directory: /opt/username/database-backup/localhost/base/20240418T185600/data/
2024-04-18 19:12:57,882 [796194] barman.copy_controller INFO: Copy step 3 of 4: [bucket 0] starting copy safe files from PGDATA directory: /opt/username/database-backup/localhost/base/20240418T185600/data/
2024-04-18 19:12:57,931 [796194] barman.copy_controller INFO: Copy step 3 of 4: [bucket 0] finished (duration: less than one second) copy safe files from PGDATA directory: /opt/username/database-backup/localhost/base/20240418T185600/data/
2024-04-18 19:12:57,937 [796194] barman.copy_controller INFO: Copy step 4 of 4: [bucket 0] starting copy files with checksum from PGDATA directory: /opt/username/database-backup/localhost/base/20240418T185600/data/
2024-04-18 19:12:58,191 [796194] barman.copy_controller INFO: Copy step 4 of 4: [bucket 0] finished (duration: less than one second) copy files with checksum from PGDATA directory: /opt/username/database-backup/localhost/base/20240418T185600/data/
2024-04-18 19:12:58,197 [796166] barman.copy_controller INFO: Copy finished (safe before None)
2024-04-18 19:12:58,211 [796166] barman.recovery_executor INFO: Copying required WAL segments.
2024-04-18 19:12:58,212 [796166] barman.recovery_executor INFO: Starting copy of 2 WAL files 2/2 from WalFileInfo(compression='gzip', name='000000010000000000000010', size=16460, time=1713459362.313908) to WalFileInfo(compression='gzip', name='000000010000000000000011', size=16557, time=1713459689.5802295)
2024-04-18 19:12:58,413 [796166] barman.recovery_executor INFO: Finished copying 2 WAL files.
2024-04-18 19:12:58,414 [796166] barman.recovery_executor INFO: Generating recovery configuration
2024-04-18 19:12:58,426 [796166] barman.recovery_executor INFO: Identify dangerous settings in destination directory.
2024-04-18 19:13:01,549 [796225] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-18 19:13:01,571 [796225] barman.utils INFO: Cleaning up lockfiles directory.
2024-04-18 19:13:01,700 [796226] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-18 19:13:01,719 [796226] barman.wal_archiver INFO: No xlog segments found from streaming for localhost.
2024-04-18 19:13:01,729 [796227] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-18 19:13:01,737 [796227] barman.postgres WARNING: Error retrieving PostgreSQL status: connection to server at "localhost" (::1), port 5432 failed: Connection refused
Hello @PaganMuffin @JP95Git ,
While taking a look at the issues reported by both of you, we guess we have 2 different problems here, one caused by an operational issue, and another by a couple of apparent bugs in Barman.
We will follow-up with a couple of messages sharing our thoughts.
Expect an update by tomorrow.
From reading the logs, and I guess @barthisrael 's "operational issue" mentioned above is this that I will explain, is that you are picking a point in time for recovery that is outside the window of the backup and WALs that have been archived, so Postgres can never recover to that point in time as it doesn't exist in the timeline from the end of the backup until the last record in the 2 WALs available.
We'll follow-up with more later
In the case reported by @JP95Git, this initially seemed an operational issue, but this actually refers to a limitation in Barman.
You seem to be requesting a --target-time
which Postgres is not able to honor because of a likely lack of COMMIT
or ABORT
records in the WAL files.
Just for context: when you pass --target-time
to Barman, what it does is prepare Postgres for recovery with recovery_target_time
set with the value of that Barman command-line option.
What Postgres does when in recovery with recovery_target_time
set, is to keep replaying WALs until it finds a COMMIT
or ABORT
record which does not honor the configured target time. It replays any WALs that are contained within the target time, and as soon as the COMMIT
or ABORT
record contains a timestamp which is after the configured target time, it stops recovery. You can check that logic in the Postgres source code: https://github.com/postgres/postgres/blob/master/src/backend/access/transam/xlogrecovery.c#L1646-L1902.
Now, with the proper context given, we will demonstrate that issue with the following example.
We have a Barman backup here with this information:
$ barman show-backup pg17 20240620T165110
Backup 20240620T165110:
Server Name : pg17
System Id : 7382227563826631291
Status : DONE
PostgreSQL Version : 170000
PGDATA directory : /var/lib/pgsql/17/data
Base backup information:
Disk usage : 22.3 MiB (54.3 MiB with WALs)
Incremental size : 22.3 MiB (-0.00%)
Timeline : 1
Begin WAL : 00000001000000000000003F
End WAL : 000000010000000000000040
WAL number : 2
Begin time : 2024-06-20 16:51:10.088025+00:00
End time : 2024-06-20 16:51:11.498244+00:00
Copy time : 1 second
Estimated throughput : 16.1 MiB/s
Begin Offset : 216
End Offset : 0
Begin LSN : 0/3F0000D8
End LSN : 0/40000000
WAL information:
No of files : 5
Disk usage : 80.0 MiB
WAL rate : 163.55/hour
Last available : 000000010000000000000045
Catalog information:
Retention Policy : not enforced
Previous Backup : - (this is the oldest base backup)
Next Backup : 20240620T165335
The important details here are:
00000001000000000000003F
, at timestamp 2024-06-20 16:51:10.088025+00:00
;000000010000000000000040
, at timestamp 2024-06-20 16:51:11.498244+00:00
.We tried doing something similar to what you did, i.e. set --target-time
to the end of the backup with a command like this:
$ barman recover pg17 20240620T165110 --target-time "2024-06-20 16:51:11.498244+00:00" --remote-ssh-command "ssh postgres@localhost" /var/lib/pgsql/17/restore
Starting remote restore for server pg17 using backup 20240620T165110
Destination directory: /var/lib/pgsql/17/restore
Remote command: ssh postgres@localhost
Doing PITR. Recovery target time: '2024-06-20 16:51:11.498244+00:00'
Copying the base backup.
Copying required WAL segments.
Generating recovery configuration
Identify dangerous settings in destination directory.
Recovery completed (start time: 2024-06-20 16:56:42.835003+00:00, elapsed time: 5 seconds)
Your PostgreSQL server has been successfully prepared for recovery!
But then Postgres fails to start up:
$ tail -10 restore/log/postgresql-Thu.log
2024-06-20 16:57:15.429 UTC [64304] LOG: restored log file "000000010000000000000041" from archive
cp: cannot stat 'barman_wal/000000010000000000000042': No such file or directory
2024-06-20 16:57:15.670 UTC [64304] LOG: restored log file "000000010000000000000041" from archive
cp: cannot stat 'barman_wal/000000010000000000000042': No such file or directory
2024-06-20 16:57:15.679 UTC [64304] LOG: redo done at 0/41FFFF90 system usage: CPU: user: 0.02 s, system: 0.36 s, elapsed: 0.48 s
2024-06-20 16:57:15.679 UTC [64304] FATAL: recovery ended before configured recovery target was reached
2024-06-20 16:57:15.682 UTC [64300] LOG: startup process (PID 64304) exited with exit code 1
2024-06-20 16:57:15.682 UTC [64300] LOG: terminating any other active server processes
2024-06-20 16:57:15.683 UTC [64300] LOG: shutting down due to startup process failure
2024-06-20 16:57:15.683 UTC [64300] LOG: database system is shut down
As we can see, the issue is FATAL: recovery ended before configured recovery target was reached
.
If we inspect the WALs which were copied by Barman, we will see there are no COMMIT
or ABORT
records among them in our case:
$ ls
00000001000000000000003F 000000010000000000000040 000000010000000000000041 archive_status
$ /usr/pgsql-17/bin/pg_waldump 00000001000000000000003F 000000010000000000000041 | egrep "COMMIT|ABORT"
$
That's why the recovery failed, Postgres was not able to reach a point where it was sure the requested recovery target was fully honored.
If we inspect the Barman WAL archive, we find these COMMIT
records:
$ ls -lh
total 149M
-rw-------. 1 vagrant vagrant 16M Jun 20 16:51 00000001000000000000003F
-rw-------. 1 vagrant vagrant 16M Jun 20 16:53 000000010000000000000040
-rw-------. 1 vagrant vagrant 16M Jun 20 16:53 000000010000000000000041
-rw-------. 1 vagrant vagrant 16M Jun 20 16:53 000000010000000000000042
-rw-------. 1 vagrant vagrant 16M Jun 20 16:53 000000010000000000000043
-rw-------. 1 vagrant vagrant 16M Jun 20 16:53 000000010000000000000044
-rw-------. 1 vagrant vagrant 16M Jun 20 16:53 000000010000000000000045
$ /usr/pgsql-17/bin/pg_waldump 00000001000000000000003F 000000010000000000000045 | egrep "COMMIT|ABORT"
rmgr: Transaction len (rec/tot): 405/ 405, tx: 789, lsn: 0/43D4DB58, prev 0/43D4DB18, desc: COMMIT 2024-06-20 16:53:25.807910 UTC; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608 relcache 16423
rmgr: Transaction len (rec/tot): 66/ 66, tx: 790, lsn: 0/43D96F08, prev 0/43D96A70, desc: COMMIT 2024-06-20 16:53:31.856379 UTC; inval msgs: catcache 63
These COMMIT
records are contained within the range between 000000010000000000000042
and 000000010000000000000045
.
Now, giving context on what Barman does when you run barman recover
with --no-get-wal
(or with get-wal
absent in your barman.conf
):
--target-time
, the logic is here: https://github.com/EnterpriseDB/barman/blob/master/barman/server.py#L1825-L1828. It gets all WALs at least until the backup’s "End WAL" is reached plus all the WAL files that have a timestamp which is older or equal to the requested --target-time
.However, as we can see in the above example, that was not enough. My backup finished at WAL 000000010000000000000040
, and the WAL 000000010000000000000041
is from timestamp 16:53
, which is after my --target-time
(2024-06-20 16:51:11.498244+00:00
). That is why Barman didn't copy WALs newer than 000000010000000000000041
.
In order to make that process reliable we would need to inspect the WAL dumps and look for COMMIT
and ABORT
records. That would put a lot of overhead in Barman archiving or Barman recovery, though, making that option not practical/well performant.
We intend to submit changes to Barman in such a way that barman recover
with --no-get-wal
and --target-time
will behave similarly to when using --target-xid
or --target-name
, i.e. do not try to guess the required WAL files, but rather copy all of the WALs from Barman to Postgres host.
We also intend to add a note to the Barman documentation, suggesting to use --get-wal
when performing PITR with either of those options of barman recover
.
For now, a workaround would be using --get-wal
in barman recover
(or configure get-wal
in your barman.conf
), so instead of copying WALs as part of barman recover
, Barman would configure Postgres with barman-wal-restore
in the restore_command
, allowing Postgres to keep asking for WALs from Barman for how long Postgres thinks it still needs more WALs, and Barman would be able to provide the required WAL files, if available in Barman.
The problem initially reported through this GitHub issue seems related with a problem in Barman.
When the user runs barman recover
with --target-time
, and the value of --target-time
contains no explicit time zone set, Barman assumes the time zone of the Barman host.
However, that implicit time zone is only taken into consideration when printing messages in the console, not when configuring Postgres for recovery.
We will illustrate the issue with an example. Assume we have a Barman host whose time zone is UTC+2. When we attempt to run barman recover
with --target-time
with no time zone, we see an output similar to this:
$ barman recover pg17 20240621T170006 --target-time "2024-06-21 17:00:09.348083" --remote-ssh-command "ssh postgres@localhost" /var/lib/pgsql/17/restore
Starting remote restore for server pg17 using backup 20240621T170006
Destination directory: /var/lib/pgsql/17/restore
Remote command: ssh postgres@localhost
Doing PITR. Recovery target time: '2024-06-21 17:00:09.348083+02:00'
Copying the base backup.
Copying required WAL segments.
Generating recovery configuration
Identify dangerous settings in destination directory.
Recovery completed (start time: 2024-06-21 17:03:49.748856+02:00, elapsed time: 8 seconds)
Your PostgreSQL server has been successfully prepared for recovery!
As we can see, Barman claims to have configured recovery_target_time
as 2024-06-21 17:00:09.348083+02:00
.
However, in practice that is not true as we can see from postgresql.auto.conf
:
$ cat /var/lib/pgsql/17/restore/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
summarize_wal = 'on'
restore_command = 'cp barman_wal/%f %p'
recovery_end_command = 'rm -fr barman_wal'
recovery_target_time = '2024-06-21 17:00:09.348083'
In my case I’m using the default timezone
(UTC
) in my Postgres instance, so the recovery_target_time
was set to something different from what was informed by Barman messages. As a consequence, my recovery attempt fails:
$ tail -15 /var/lib/pgsql/17/restore/log/postgresql-Fri.log
2024-06-21 15:14:36.673 UTC [32083] LOG: starting point-in-time recovery to 2024-06-21 17:00:09.348083+00
2024-06-21 15:14:36.678 UTC [32083] LOG: redo starts at 1/7000028
2024-06-21 15:14:36.681 UTC [32083] LOG: restored log file "000000010000000100000008" from archive
2024-06-21 15:14:36.689 UTC [32083] LOG: restored log file "000000010000000100000009" from archive
cp: cannot stat 'barman_wal/00000001000000010000000A': No such file or directory
2024-06-21 15:14:36.698 UTC [32083] LOG: completed backup recovery with redo LSN 1/7000028 and end LSN 1/7000120
2024-06-21 15:14:36.698 UTC [32083] LOG: consistent recovery state reached at 1/7000120
2024-06-21 15:14:36.698 UTC [32079] LOG: database system is ready to accept read-only connections
cp: cannot stat 'barman_wal/00000001000000010000000A': No such file or directory
2024-06-21 15:14:36.704 UTC [32083] LOG: redo done at 1/9000238 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s
2024-06-21 15:14:36.704 UTC [32083] FATAL: recovery ended before configured recovery target was reached
2024-06-21 15:14:36.712 UTC [32079] LOG: startup process (PID 32083) exited with exit code 1
2024-06-21 15:14:36.712 UTC [32079] LOG: terminating any other active server processes
2024-06-21 15:14:36.713 UTC [32079] LOG: shutting down due to startup process failure
2024-06-21 15:14:36.713 UTC [32079] LOG: database system is shut down
It is looking for a recovery_target_time
which is in the past, thus will never be reachable.
We are working on a fix for this issue. You can follow this PR for more details.
Good news: The PITR recovery is now working.
recovery_options = 'get-wal'
to barman.conf/path/to/barman/bin/barman --config /path/to/config/barman.conf recover localhost 20240627T213726 /path/to/database --target-immediate
restore_command = '/path/to/barman/bin/barman --config /path/to/config/barman.conf get-wal -P localhost %f > %p'
It works :-)
Step 3 was ugly, but apparently necessary, since I run PostgreSQL and barman under a normal user. I also had to compile both tools myself and install them in /opt. For security reasons I don't have root privileges and I'm only allowed to start/stop selected services via sudo.
Would be great if we could override the generated restore_command from barman using some setting in barman such that step 3 is no longer needed.
Hello @JP95Git ,
Good news: The PITR recovery is now working.
- Add
recovery_options = 'get-wal'
to barman.conf- Stop PostgreSQL
/path/to/barman/bin/barman --config /path/to/config/barman.conf recover localhost 20240627T213726 /path/to/database --target-immediate
- Edit /path/to/database/postgresql.auto.conf Replace the restore_command with
restore_command = '/path/to/barman/bin/barman --config /path/to/config/barman.conf get-wal -P localhost %f > %p'
- Start PostgreSQL
It works :-)
Thanks for letting us now!
As both PRs related with this GitHub issue were merged into master
and are only pending a release, we are now closing this issue.
Once a new Barman version is released, which is expected to occur somewhen between August and September, those fixes/enhancements will become available.
Step 3 was ugly, but apparently necessary, since I run PostgreSQL and barman under a normal user. I also had to compile both tools myself and install them in /opt. For security reasons I don't have root privileges and I'm only allowed to start/stop selected services via sudo.
Would be great if we could override the generated restore_command from barman using some setting in barman such that step 3 is no longer needed.
As we are closing this issue, we opened a separate GitHub issue #956 to track this enhancement request.
Although this sounds unusual, and in most cases what is automatically inferred by Barman should be enough, we understand that there might exist some cases where overrides are necessary.
3. Edit /path/to/database/postgresql.auto.conf Replace the restore_command with `restore_command = '/path/to/barman/bin/barman --config /path/to/config/barman.conf get-wal -P localhost %f > %p'`
This restore command could be replaced with barman-wal-restore
if you also compiled it. It would make the command slimmer
After doing PITR recovery with specific time Postgres won't start. Recovery command is reporting that postgres is successfully prepared for recovery, but after starting postgres instance it fail and log that WAL file, which is after target time according to timeline, is missing. I can do a restore to the latest version of DB without a problem.
Command used to recovery
Command log
Postgres Logs
Timeline UTC+1 (Barman server, Postgres server have same timezone)
TIME | ACTION | WAL NUMBER | ADD NFO
Barman log