EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.07k stars 191 forks source link

barman-cloud-backup to Azure Blob Storage hangs, seemingly at the end #923

Closed aujasm1 closed 5 months ago

aujasm1 commented 5 months ago

Hi Folks, I'm running:

Ubuntu 20.04 + barman-cli-cloud 3.10 + pg 13 and attempting to backup a relatively small <200mb DB to Azure Blob storage. I've successfully run 2 backups already with the same command but I can't seem to get any more to run successfully. Full disclosure......I used pgbench to inject data (to generate WALs) between backup 2 and 3 - so the DB grew a little, but not much really. Here's the command I've been running:

barman-cloud-backup -vv --immediate-checkpoint --cloud-provider=azure-blob-storage https://debackupsdevauseast.blob.core.windows.net/devops1 devops1

The debug output reveals: ManagedIdentityCredential.get_token succeeded Starting backup '20240415T115850'- Uploading 'devops1/base/20240415T115850/data.tar', part '1' (worker 0) It then appears to successfully process requests to upload files and folders via parts 2 through 80 where it then fails consistently at part 81 (which I'm starting to think is the end of the backup?).

Finally, it hangs.....with the tail of debug output looking like:

2024-04-15 11:58:58,487 [107313] DEBUG: Uploading base/1/2618_vm
2024-04-15 11:58:58,487 [107313] DEBUG: Uploading base/1/13264_fsm
2024-04-15 11:58:58,488 [107313] DEBUG: Uploading base/1/3575
2024-04-15 11:58:58,489 [107313] DEBUG: Uploading base/1/1417
2024-04-15 11:58:58,490 [107313] DEBUG: Uploading base/1/4149
2024-04-15 11:58:58,491 [107313] DEBUG: Uploading base/1/2680
2024-04-15 11:58:58,491 [107313] DEBUG: Uploading base/1/3600
2024-04-15 11:58:58,504 [107313] DEBUG: Uploading pg_logical/replorigin_checkpoint
2024-04-15 11:58:58,508 [107313] INFO: Uploading 'pg_control' file from '/var/lib/postgresql/13/main/global/pg_control' to 'data.tar' with path 'global/pg_control'
2024-04-15 11:58:58,508 [107313] INFO: Uploading 'config_file' file from '/etc/postgresql/13/main/postgresql.conf' to 'data.tar' with path 'postgresql.conf'
2024-04-15 11:58:58,509 [107313] INFO: Uploading 'hba_file' file from '/etc/postgresql/13/main/pg_hba.conf' to 'data.tar' with path 'pg_hba.conf'
2024-04-15 11:58:58,509 [107313] INFO: Uploading 'ident_file' file from '/etc/postgresql/13/main/pg_ident.conf' to 'data.tar' with path 'pg_ident.conf'
2024-04-15 11:58:58,509 [107313] WARNING: The usage of include directives is not supported for files that reside outside PGDATA.
Please manually backup the following files:
    /etc/postgresql/13/main/postgresql-epiphany.conf
    /etc/postgresql/13/main/postgresql-lumada.conf

2024-04-15 11:58:58,510 [107313] INFO: Stopping backup '20240415T115850'
2024-04-15 11:58:58,510 [107313] DEBUG: Stop of native concurrent backup
2024-04-15 11:58:58,552 [107337] DEBUG: https://debackupsdevauseast.blob.core.windows.net:443 "PUT /devops1/devops1/base/20240415T115850/data.tar?comp=block&blockid=MDAwODE%3D HTTP/1.1" 201 0
2024-04-15 11:58:58,553 [107337] INFO: Response status: 201
Response headers:
    'Content-Length': '0'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': 'a0a69544-901e-0038-332c-8fde74000000'
    'x-ms-client-request-id': '8a876e1b-fb1f-11ee-9755-9fa9bb79533d'
    'x-ms-version': 'REDACTED'
    'x-ms-content-crc64': 'REDACTED'
    'x-ms-request-server-encrypted': 'REDACTED'
    'Date': 'Mon, 15 Apr 2024 11:58:58 GMT'

There are no artifacts uploaded to Azure and the command never finishes.  Ctl-C does not break out of it.  I end up having to kill the 2-4 x barman-cloud-backup processes still running.  I have used "ps -ef" to look for running "barman processes" between runs.  I have stopped and restarted Postregs between runs thinking that perhaps something was already running that might be preventing the processes from completed but had no luck in resolving.

/var/log/syslog reveals nothing of interest.  /var/log/postgresql/13/main/postgres.log reveals nothing of interest.

Any suggestions on what might be causing it to not finish properly?
Anything extra I need to collect?  
Has anyone seen this before?
Cheers and thanks in advance. 
Jason
gcalacoci commented 5 months ago

Hi,

Seems that barman-cloud-backup is waiting for pg_backup_stop to complete on the pg server... If the call to the function pg_backup_stop isn't complete, the backup process is stuck and is unable to complete the multipart upload, and this is why you are not seeing artefacts on Azure.

It is difficult on our side to diagnose why pg_backup_stop is stuck or is so slow to complete. You said you already performed some backups before getting to this situation, did you change anything in the server configuration?

aujasm1 commented 5 months ago

Actually I think I may have changed the archive_command between backups such that it used barman-cloud-wal-backup instead of rsync. If there was a problem with wal shipping would that stop a full backup from being taken? Is there any way of checking if pg_backup_stop is running/needs to run still and what it might be waiting for?

aujasm1 commented 5 months ago

I fixed the problem with my archive_command - the barman-cloud-wal-backup URL was missing the container. After fixing this the WALs all shipped to Azure and PG caught up. Best of all, it seems to have now enabled the barman-cloud-backup to complete also. Thanks heaps for your quick reply.

gcalacoci commented 5 months ago

I was about to suggest checking if archive_command was failing, as is usually a common cause for pg_backup_stop failure to complete... but i see that you got it yourself 😄

Happy we got this sorted! Closing this