EnterpriseDB / barman

Barman - Backup and Recovery Manager for PostgreSQL
https://www.pgbarman.org/
GNU General Public License v3.0
2.15k stars 193 forks source link

When performing a `barman check` command for a single server, barman stats every configured server's backup_directory #308

Open kevinbarbour opened 4 years ago

kevinbarbour commented 4 years ago

I am backing up about 50 postgres databases via barman, all on the same host. A lot of these backups are configured with a backup_directory located on an NFS share that can sometimes be quite slow to respond. A few of our more latency-sensitive backups (the databases with much more rapid change rates/WAL generation) are running with the backup_directory configured to local NVME and not touching the NFS shares at all. I've noticed an unfortunate behavior with the barman CLI commands where if you perform commands on a single server it seems to unnecessarily read every backup location configured on the server. In our configuration this means that when you run barman check on a server configured on local NVME it often times out because barman tries to stat the NFS-hosted directories of the 40+ other configured servers.

Can be reproduced with configuration similar to the following:

Configuration excerpts: /etc/barman.d/nvme_backup.conf

[nvme_db]
archiver = off
backup_directory = /srv/nvme/database/backup
backup_method = postgres
backup_options = concurrent_backup
<...>

/etc/barman.d/nfs_backup.conf

[nfs_db]
archiver = off
backup_directory = /mnt/nfs/database/backup
backup_method = postgres
backup_options = concurrent_backup
<...>

With the above configuration if I run strace barman check nvme_db

I see the following in the strace output

lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=45, ...}) = 0
lstat("/mnt/nfs", {st_mode=S_IFDIR|0755, st_size=20, ...}) = 0
lstat("/mnt/nfs/database/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/mnt/nfs/database/backup", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0

I am not sure if this is intentional behavior, but it seems very odd that barman is performing some sort of checks on the backup directory configured for nfs_db when I am only checking nvme_db.

amenonsen commented 3 years ago

I agree that this is unfortunate behaviour. We shall investigate.

mikewallace1979 commented 3 years ago

Thanks for the thorough bug report @kevinbarbour!

I traced this down to the call to config.Config._populate_servers which is called by config.Config.server_names which is called by cli.get_server_list, itself called by cli.check.

The specific line which causes stat to be called on all backup directories is the call to config.Config._check_conflicting_paths.

The job of _check_conflicting_paths is to check that there are no directories in the configuration which point to the same place on disk across all servers in the barman configuration. Any directories in the configuration which resolved to the same place on disk are considered unsafe and would likely render backups taken for both servers unusable, so this is an important check.

Taking a closer look at the implementation, the reason this has to touch the filesystem is because of the call to os.path.realpath which de-references symlinks to find their actual location on the filesystem. This is an important part of the conflicting paths check because if we didn't do this then the check could pass even though different servers have directories which are configured with different symlinks with resolve to the same location on disk.

So, the reason seems sensible enough however it is clearly resulting in less-than-optimal performance in your environment. I don't yet have any good ideas for improving things but will give it some thought and discuss with the team. Any ideas you have here are also welcome.