allanjude / zxfer

A continuation of development on zxfer, a popular script for managing ZFS snapshot replication
BSD 2-Clause "Simplified" License
123 stars 40 forks source link

optimizing inspect_delete_snap() for significant speedups under certain workloads with many snapshots #64

Open totalAldo opened 5 months ago

totalAldo commented 5 months ago

Hi Allan,

In an effort to improve the total execution time of zxfer, one of the bottlenecks that I encountered was the logic within inspect_delete_snap(). The v1.1.7 implementation uses nested loops to determine which destination snapshots to delete. This logic is even executed when -d is not used. On some of my systems with hundreds of snapshots per dataset, this takes several minutes to execute. e.g. if a source and destination contain 1000 snapshots each, then 1000* 1000 iterations of the loop have to be made each time spawning at least 2 grep + 2 cut commands to compare the snapshot names.

I've refactored the function and moved the nested loops to use temp files, and comm to determine snapshots that don't exist in the destination. This implementation has reduced the execute time from minutes to seconds.

Here's a possible implementation.

This has only been tested in FreeBSD 14.0

https://github.com/totalAldo/zxfer/blob/9a7b4e1da5305952863ddbe518403b8c8c18521b/zxfer#L1655-L1789