EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.14k stars 193 forks source link

Include wal_name in list_bucket prefix to improve performance of download_wal #876

Closed sjuls closed 10 months ago

sjuls commented 11 months ago

The current implementation of download_wal lists all files in the wal directory and filters client-side for the actual wal being requested. However since we're querying blob storage which filters by prefix we can include the wal_name in the prefix to execute this filtering on the cloud provider while still retrieving both compressed and non-compressed files.

The list_bucket api in the cloud_interface currently specifies prefix for filtering without any restrictions on the prefix pointing to a folder, and I believe all cloud providers supported by barman support filtering by any prefix as well.

This change should provide a noticeable performance improvement on wal directories containing a large amount of archived wal segments.

sjuls commented 11 months ago

Hi again @mikewallace1979,

So as you might have guessed we're struggling a bit with a huge amount wal segments in blob storage 😅. So we're looking for opportunities to improve wal retrieval to reduce recovery times.

Let me know what you think of this proposal 🙏

mikewallace1979 commented 11 months ago

This looks like a good change - I've tested it with the three supported cloud providers using a combination both with and without compression and it is able to find history files and WAL files as expected.

The unit test test_fails_if_wal_not_found will need updating because it currently mocks a list_bucket response containing a path other than the one requested. That worked with the old code but with this patch I think the test should just be mocking an empty list as the list_bucket response.

sjuls commented 11 months ago

Great 👍 I've pushed a correction to the test. Thanks!