digint / btrbk

Tool for creating snapshots and remote backups of btrfs subvolumes
https://digint.ch/btrbk/
GNU General Public License v3.0
1.64k stars 120 forks source link

Cannot autoresolve massively accumulated missed deletions #476

Closed zatricky closed 2 years ago

zatricky commented 2 years ago

When old snapshots aren't deleted (perhaps a server target disappears and comes back at a later date), old snapshots can accumulate. When there are enough, the subvolume deletion function fails to execute due to the argument list being too long.

ERROR: Command execution failed (Argument list too long): `btrfs subvolume delete 'very' 'very' 'very' 'long' 'list' 'of' 'subvolumes'`

This should be a rare problem and isn't that hard manually to fix.

In theory a possible fix would be to set a limit to the length of the argument list and, if it is "too long", to split the arguments into multiple calls to btrfs subvolume delete.

digint commented 2 years ago

Uhrg, true, will need to split this. On my system the argument limit is around 2MB, not sure where this is defined and what the minimum is, will check. Probably around 100 subvolumes in a batch would be reasonable.

# xargs --show-limits
Your environment variables take up 2706 bytes
POSIX upper limit on argument length (this system): 2092398
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2089692
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647
zatricky commented 2 years ago

I did a bit of digging: The kernel constant is ARG_MAX, retrievable using getconf ARG_MAX from commandline. Apparently in perl there is a function called sysconf() that you can use to get the value.

Apparently the default for x64 Linux is 2MB, while the 32-bit default is 32KB.

digint commented 2 years ago

fixed in 6b465bf06bbafc257278fc05f319086691c5826d

btrbk now runs btrbk subvolume delete for each subvolume to be deleted. See explanation in commit message:

Deleting multiple subvolumes at once always caused the problem that we need to parse stderr of "rm" and "btrfs subvolume delete" in order to know which subvolume actually failed, which is problematic (version dependent, language dependent). Also, we would need to restrict the number of subvolumes based on the maximum allowed length for shell commands, which is system-dependent (check getconf ARG_MAX).

Deleting subvolumes sequentially has slightly negative impact on execution time (multiple rsh commands), with the benefit of being more robust

leaving issue open until fully released

digint commented 2 years ago

fix included in btrbk-0.32.3, closing issue