Open mzealey opened 5 years ago
I see two ways of implementing this:
Add a command to the pipe (between btrfs send
and btrfs receive
), measuring the "transferred size of btrfs-send". This might not really reflect the size used on the target, but at least gives some magnitude.
In order not to add too much to the pipe, I tried using mbuffer -v 2
(already in the pipe when using stream_buffer
) which prints a summary. Sadly this does not work as mbuffer prints the status to the controlling terminal instead of file descriptor 2, making it impossible to catch from btrbk.
Another approach would be to add dd
(or any other command capable of printing a summary) to the pipe: this would introduce some more context switches and slow down things, but should work.
A better approach would be to directly scan the target "received" subvolume. I've come with a little script for this:
received-length.sh:
SUBVOL=/path/to/subvolume
CGEN=$(btrfs subvolume show "$SUBVOL" | sed -n 's/\s*Gen at creation:\s*//p')
btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \
| cut -d' ' -f7 \
| tr '\n' '+' \
| sed 's/\+\+$/\n/' \
| bc
This simply sums up the "len" field from all modified files since the creation of the subvolume. Works fine, as btrfs receive
first makes a snapshot of the parent subvolume, then adds the files according to the send-stream.
Issues:
I'm planning to implement this either with a new btrbk command, something like btrbk list backup-size
.
This needs some more investigation, maybe there's a nicer way to get the "real size used on disk".
I would think option 1 would be a reasonable estimate and not need much in the way of overhead. I seem to recall there is another way if qgroups are enabled but in my case that would not help
edit: remove quoted text
I would think option 1 would be a reasonable estimate and not need much in the way of overhead.
Yes, this is also valuable information. Especially when you want to also have an estimate of the ssh traffic generated by btrbk.
Having a command for listing (option 2) has the advantage that it is reproducible, and also works for manually generated backups.
btrfs subvolume find-new
above is not very accurate, and gives only a rough estimate of what is really added on disk (it ignores deleted files, shared extents e.g. by clone sources), etc.
For more accurate results, we need to do more extensive analysis on the block level, which unfortunately is very time consuming. I did some promising tests with extents-list
, and implemented a very experimental btrbk extents-diff
command on the extents-diff branch for testing.
I was about to file a new issue begging for a new diffstat
or diff -stat
but it sounds that the desire is similar to the one discussed here - to see the summary of differences (not only the total size of new/modified files as diff
reports I guess) between two snapshots. Even if reported sizes (deleted
, added
or modified
) do not account for possible operations on CoW'ed files -- that already would be useful information.
It would be nice (perhaps when running
-v run
) to include details about how many bytes were sent which is presumably roughly equivalent of how much space the snapshot will take up based on the previous one? Perhaps this could also be saved somewhere and output in thestats
so you can see roughly what the deltas are between snaps?