EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.14k stars 193 forks source link

Error BARMAN #1010

Closed alainmahe closed 14 hours ago

alainmahe commented 2 months ago

[barman@SRV_barman barman]$ barman backup srv_postres WARNING: No backup strategy set for server 'srv_postres' (using default 'concurrent_backup'). Starting backup using rsync-concurrent method for server srv_postres in /SVG_FS/srv_postres/base/20240828T094848 Backup start at LSN: 247/15000028 (000000010000024700000015, 00000028) This is the first backup for server srv_postres ERROR: The backup has failed starting backup Asking PostgreSQL server to finalize the backup. ERROR: Backup failed writing backup label. DETAILS: [Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T094848/data/backup_label' Processing xlog segments from file archival for srv_postres 000000010000024700000014 000000010000024700000015 000000010000024700000015.00000028.backup EXCEPTION: [Errno 5] Input/output error: '/SVG_FS/srv_postres/wals/tmp3os2d4wd' See log file for more details.

2024-08-28 09:48:48,429 [3269512] barman.backup_executor INFO: 16400, fib_data, /pg_tblspce 2024-08-28 09:48:49,040 [3269512] barman.backup_executor INFO: Backup start at LSN: 247/15000028 (000000010000024700000015, 00000028) 2024-08-28 09:48:49,047 [3269512] barman.backup_executor INFO: This is the first backup for server srv_postres 2024-08-28 09:48:49,071 [3269512] barman.backup_executor ERROR: The backup has failed starting backup 2024-08-28 09:48:49,071 [3269512] barman.backup_executor INFO: Asking PostgreSQL server to finalize the backup. 2024-08-28 09:48:52,720 [3269512] barman.backup ERROR: Backup failed writing backup label. DETAILS: [Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T094848/data/backup_label' 2024-08-28 09:48:52,803 [3269512] barman.wal_archiver INFO: Found 3 xlog segments from file archival for srv_postres. Archive all segments in one run. 2024-08-28 09:48:52,803 [3269512] barman.wal_archiver INFO: Archiving segment 1 of 3 from file archival: srv_postres/000000010000024700000014 2024-08-28 09:48:53,018 [3269512] barman.wal_archiver INFO: Archiving segment 2 of 3 from file archival: srv_postres/000000010000024700000015 2024-08-28 09:48:53,245 [3269512] barman.wal_archiver INFO: Archiving segment 3 of 3 from file archival: srv_postres/000000010000024700000015.00000028.backup 2024-08-28 09:48:53,395 [3269512] barman.cli ERROR: [Errno 5] Input/output error: '/SVG_FS/srv_postres/wals/tmp3os2d4wd' See log file for more details. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/barman/cli.py", line 2390, in main args.func(args) File "/usr/lib/python3.6/site-packages/barman/cli.py", line 546, in backup backup_name=args.backup_name, File "/usr/lib/python3.6/site-packages/barman/server.py", line 1651, in backup self.backup_manager.remove_wal_before_backup(backup_info) File "/usr/lib/python3.6/site-packages/barman/backup.py", line 1259, in remove_wal_before_backup with tempfile.TemporaryFile(mode="w+", dir=xlogdb_dir) as fxlogdb_new: File "/usr/lib64/python3.6/tempfile.py", line 624, in TemporaryFile _os.unlink(name) OSError: [Errno 5] Input/output error: '/SVG_FS/srv_postres/wals/tmp3os2d4wd' 2024-08-28 09:49:01,416 [3269547] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file) 2024-08-28 09:49:01,438 [3269547] barman.backup_executor WARNING: No backup strategy set for server 'srv_postres' (using default 'concurrent_backup'). 2

barman@SRV_barman barman]$ barman diagnose WARNING: No backup strategy set for server 'srv_postres' (using default 'concurrent_backup'). { "global": { "config": { "barman_home": "/SVG_FS", "barman_user": "barman", "compression": "gzip", "configuration_files_directory": "/etc/barman/conf.d", "errors_list": [], "log_file": "/var/log/barman/barman.log", "log_level": "INFO", "minimum_redundancy": "0", "retention_policy": "REDUNDANCY 35" }, "system_info": { "barman_ver": "3.10.0", "kernel_ver": "Linux SRV_barman 5.15.0-106.131.4.el8uek.x86_64 #2 SMP Fri Sep 22 16:00:58 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux", "python_ver": "Python 3.6.8", "release": "RedHat Linux Red Hat Enterprise Linux release 8.8 (Ootpa)", "rsync_ver": "rsync version 3.1.3 protocol version 31", "ssh_ver": "", "timestamp": "2024-08-28T09:45:08.523948+02:00" } }, "models": {}, "servers": { "srv_postres": { "active_model": null, "backups": { "20240828T093529": { "backup_id": "20240828T093529", "backup_label": "'START WAL LOCATION: 247/12000060 (file 000000010000024700000012)\nCHECKPOINT LOCATION: 247/12000098\nBACKUP METHOD: streamed\nBACKUP FROM: master\nSTART TIME: 2024-08-28 09:35:30 CEST\nLABEL: Barman backup srv_postres 20240828T093529\nSTART TIMELINE: 1\n'", "begin_offset": 96, "begin_time": "2024-08-28T09:35:29.762677+02:00", "begin_wal": "000000010000024700000012", "begin_xlog": "247/12000060", "compression": null, "config_file": "/pg_data/MyDB/postgresql.conf", "copy_stats": null, "deduplicated_size": null, "end_offset": 304, "end_time": "2024-08-28T09:35:31.149047+02:00", "end_wal": "000000010000024700000012", "end_xlog": "247/12000130", "error": "failure writing backup label ([Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T093529/data/backup_label')", "hba_file": "/pg_data/MyDB/pg_hba.conf", "ident_file": "/pg_data/MyDB/pg_ident.conf", "included_files": null, "mode": "rsync-concurrent", "pgdata": "/pg_data/MyDB", "server_name": "srv_postres", "size": null, "status": "FAILED", "systemid": "6744984042826244766", "tablespaces": [ [ "fib_data", 16400, "/pg_tblspce" ] ], "timeline": 1, "version": 110003, "xlog_segment_size": 16777216 } }, "config": { "active": true, "archiver": true, "archiver_batch_size": 0, "autogenerate_manifest": false, "aws_profile": null, "aws_region": null, "azure_credential": null, "azure_resource_group": null, "azure_subscription_id": null, "backup_compression": null, "backup_compression_format": null, "backup_compression_level": null, "backup_compression_location": null, "backup_compression_workers": null, "backup_directory": "/SVG_FS/srv_postres", "backup_method": "rsync", "backup_options": "concurrent_backup", "bandwidth_limit": null, "barman_home": "/SVG_FS", "barman_lock_directory": "/SVG_FS", "basebackup_retry_sleep": 30, "basebackup_retry_times": 0, "basebackups_directory": "/SVG_FS/srv_postres/base", "check_timeout": 30, "cluster": "srv_postres", "compression": "gzip", "config_changes_queue": "/SVG_FS/cfg_changes.queue", "conninfo": "host=srv_postres port=5432 user=barman dbname=postgres password=REDACTED", "create_slot": "manual", "custom_compression_filter": null, "custom_compression_magic": null, "custom_decompression_filter": null, "description": "NON_PROD PostgreSQL Master server", "disabled": false, "errors_directory": "/SVG_FS/srv_postres/errors", "forward_config_path": false, "gcp_project": null, "gcp_zone": null, "immediate_checkpoint": false, "incoming_wals_directory": "/SVG_FS/srv_postres/incoming", "last_backup_maximum_age": null, "last_backup_minimum_size": null, "last_wal_maximum_age": null, "lock_directory_cleanup": true, "max_incoming_wals_queue": null, "minimum_redundancy": 0, "msg_list": [], "name": "srv_postres", "network_compression": false, "parallel_jobs": 1, "parallel_jobs_start_batch_period": 1, "parallel_jobs_start_batch_size": 10, "path_prefix": null, "post_archive_retry_script": null, "post_archive_script": null, "post_backup_retry_script": null, "post_backup_script": null, "post_delete_retry_script": null, "post_delete_script": null, "post_recovery_retry_script": null, "post_recovery_script": null, "post_wal_delete_retry_script": null, "post_wal_delete_script": null, "pre_archive_retry_script": null, "pre_archive_script": null, "pre_backup_retry_script": null, "pre_backup_script": null, "pre_delete_retry_script": null, "pre_delete_script": null, "pre_recovery_retry_script": null, "pre_recovery_script": null, "pre_wal_delete_retry_script": null, "pre_wal_delete_script": null, "primary_checkpoint_timeout": 0, "primary_conninfo": null, "primary_ssh_command": null, "recovery_options": "", "recovery_staging_path": null, "retention_policy": "redundancy 5 b", "retention_policy_mode": "auto", "reuse_backup": null, "slot_name": null, "snapshot_disks": null, "snapshot_gcp_project": null, "snapshot_instance": null, "snapshot_provider": null, "snapshot_zone": null, "ssh_command": "ssh postgres@srv_postres", "streaming_archiver": false, "streaming_archiver_batch_size": 0, "streaming_archiver_name": "barman_receive_wal", "streaming_backup_name": "barman_streaming_backup", "streaming_conninfo": "host=srv_postres port=5432 user=barman dbname=postgres password=REDACTED", "streaming_wals_directory": "/SVG_FS/srv_postres/streaming", "tablespace_bandwidth_limit": null, "wal_conninfo": null, "wal_retention_policy": "simple-wal 5 b", "wal_streaming_conninfo": null, "wals_directory": "/SVG_FS/srv_postres/wals" }, "status": { "archive_command": "rsync -a %p barman@SRV_barman:/SVG_FS/srv_postres/incoming/%f", "archive_mode": "on", "archive_timeout": 900, "archived_count": 9005, "checkpoint_timeout": 300, "config_file": "/pg_data/MyDB/postgresql.conf", "current_archived_wals_per_second": 0.00214236564998093, "current_lsn": "247/1400A4C8", "current_size": 39049876757.0, "current_xlog": "000000010000024700000014", "data_checksums": "off", "data_directory": "/pg_data/MyDB", "failed_count": 7546, "has_backup_privileges": true, "has_monitoring_privileges": true, "hba_file": "/pg_data/MyDB/pg_hba.conf", "hot_standby": "on", "ident_file": "/pg_data/MyDB/pg_ident.conf", "is_archiving": true, "is_in_recovery": false, "is_superuser": true, "last_archived_time": "2024-08-28T09:37:31.465322+02:00", "last_archived_wal": "000000010000024700000013", "last_failed_time": "2024-08-28T09:33:44.354622+02:00", "last_failed_wal": "0000000100000246000000A1", "max_replication_slots": "10", "max_wal_senders": "10", "postgres_systemid": "6744984042826244766", "replication_slot": null, "replication_slot_support": true, "server_txt_version": "11.3", "stats_reset": "2024-07-09T18:10:18.974990+02:00", "synchronous_standby_names": [ "" ], "version_supported": true, "wal_compression": "off", "wal_keep_segments": "0", "wal_level": "replica", "xlog_segment_size": 16777216 }, "system_info": { "kernel_ver": "Linux srv_postres 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue May 14 11:55:25 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux", "python_ver": "", "release": "RedHat Linux Red Hat Enterprise Linux Server release 7.9 (Maipo)", "rsync_ver": "rsync version 3.1.2 protocol version 31", "ssh_ver": "" }, "wals": { "last_archived_wal_per_timeline": { "00000001": { "compression": "gzip", "name": "000000010000024700000013", "size": 17711, "time": 1724830650.0249295 } } } } } }

martinmarques commented 2 months ago

Is wals_directory a WORM partition? I think you have a problem with the initial backup where Barman removed unneeded WALs, but the FS refuses to unlink such files:

File "/usr/lib/python3.6/site-packages/barman/backup.py", line 1259, in remove_wal_before_backup
with tempfile.TemporaryFile(mode="w+", dir=xlogdb_dir) as fxlogdb_new:
File "/usr/lib64/python3.6/tempfile.py", line 624, in TemporaryFile
_os.unlink(name)
alainmahe commented 2 months ago

Hello, Thank you for your reply, The problem is the directory data is not created DETAILS: [Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T094848/data/backup_label'

[barman@SRV_barman base]$ ls -l 20240828T094848 total 0 -rwxrwxrwx. 1 root root 1067 Aug 28 09:48 backup.info

martinmarques commented 2 months ago

Did you check for errors at the OS level? The error comes from the OS when calling _os.unlink(name).

What FS holds the /SVG_FS/srv_postres/ directory? Can you share the output from df /SVG_FS/srv_postres/?

alainmahe commented 2 months ago

Hello:

blobfuse2 4.0G 12K 4.0G 1% /SVG_FS/srv_postres

martinmarques commented 2 months ago

Have you tested manually writing and deleting files from that file system?

The errors you shared point to writing and deleting files in the wals and backup directories.

I would also recommend moving to the latest 3.11.1 as we've added a small change to the exception handling regarding errors that come from OS permission issues. It's possible that the culprit of the failure is hidden and the changes we have added in 3.11.1 will show us where the problem is.

martinmarques commented 14 hours ago

Given the time that has passed, I'm closing this ticket. Feel free to open a new one, or if you have questions, you can also ask in the Google Groups where there are other Barman experts.

https://groups.google.com/g/pgbarman