Open jantatje opened 3 years ago
I agree.
Also, I would like to add that while it is possible to implement on the user side (by redefining paste
function in lfrc
, as shown in man page), you'll loose the benefit of progress reporting in status bar and handling of duplicate names (.~1~
suffix).
I've made a much better workaround. @jantatje, you might find it useful.
Added some cosmetic changes.
Latest version lives now in Wiki/Tips section. Apparently gone??
This snippet: 1. Tries to use CoW (reflinks) on btrfs, zfs and xfs 2. Falls back to lf's native paste **(keeps progress %)** if it can't 3. Handles matching names in destination with `.~1~` like lf 4. Forwards cp errors to status line, if any ...Now supports reflinks between subvolumes on same source drive. (Read before using) ```sh cmd paste_try_cow &{{ # # This was very helpful for debugging: # log_file="/tmp/lf-$id-paste_try_cow-log-$(date +'%Y-%m-%d_%H-%M-%S')" # [ -f "$log_file" ] || touch "$log_file" # exec 1>> $log_file 2>&1 # set -x # In theory, this may fail, # but I tested it on selection with 10k files - everything worked (bash) set -- $(cat ~/.local/share/lf/files) mode="$1" shift if [ $mode = 'copy' ]; then # Reflink if all items of selection and the destination are on the # same source device and it is CoW fs. # (to make sure reflink never fails in first place, so we don't have to # clean up) # Now with reflink support between different subvolumes of the same btrfs # Findmnt can only query one target at a time. # Thus, complex cases (e.g. one source item on btrfs on /dev/sdb1, # another source item on ext4 on /dev/sdb2) are not supported, # and will fallback to lf's paste. single_src=false if [[ "$(df --output=target -- "$@" | sed '1d' | sort -u | wc -l)" == 1 ]]; then single_src=true fi # We want: same device for all srcs and dst, with fs that support reflinks if [[ "$(findmnt --noheadings --list --nofsroot --real -o source --target "$PWD")" == \ "$(findmnt --noheadings --list --nofsroot --real -o source --target "$1")" ]] && \ $single_src && \ [[ "$(findmnt --noheadings --list --nofsroot --real -o fstype --target "$PWD")" =~ ^(btrfs|xfs|zfs)$ ]]; then echo 'selected copy and cp reflink paste' start=$(date '+%s') # Handle same names in dst # TODO: parallelism, idk - but exit/return/break won't stop the loop from subshell... for i in "$@"; do name="${i##*/}" original="$name" count=0 while [ -w "$PWD/$name" ]; do count=$((count+1)) name="$original.~$count~" done set +e cp_out="$(cp -rn --preserve=all --reflink=always -- "$i" "$PWD/$name" 2>&1)" set -e if [ ! -z "$cp_out" ]; then lf -remote "send $id echoerr $cp_out" exit 0 fi done finish=$(( $(date '+%s') - $start )) t='' if (( $finish > 2 )); then t="${finish}s" fi # Or just skip a file when names are the same. # (A LOT faster if you e.g. pasting selection of 10k files) # cp -rn --reflink=always -- "$@" . # lf -remote "send clear" green=$'\u001b[32m' reset=$'\u001b[0m' lf -remote "send $id echo ${green}reflinked!${reset} $t" else echo 'selected copy and lf native paste' lf -remote "send $id paste" # lf -remote "send clear" fi elif [ $mode = 'move' ]; then echo 'selected move and lf native paste' lf -remote "send $id paste" # lf -remote "send clear" fi # # for debug # set +x lf -remote "send load" }} # name is different to avoid recursive calls map p paste_try_cow ```
OMG, io.Copy actually seems to support CoW!
How do I know?
~/tmp ❯ time iocopy big.mkv big-iocopy-new.mkv
iocopy big.mkv big-iocopy-new.mkv 0.00s user 0.06s system 81% cpu 0.072 total
~/tmp ❯ time cp --reflink=always big.mkv big-cp-ref.mkv
cp -ir --reflink=always big.mkv big-cp-ref.mkv 0.00s user 0.02s system 96% cpu 0.024 total
~/tmp ❯ time cp --reflink=never big.mkv big-noref.mkv
cp -ir --reflink=never big.mkv big-noref.mkv 0.00s user 4.17s system 67% cpu 6.148 total
6 seconds for non-CoW cp vs less than 100ms for io.Copy
and cp with CoW. WOW!
This is with go version go1.17.3 linux/amd64
on btrfs (kernel 5.15.3).
Both files are actually seem to refer to the same extents:
Also, this a bit different, but still related, so I'm going to comment on it here.
TIL: cifs
and samba 4.1.0+ actually support server-side copy and even remote CoW (latter if share is on btrfs).
Sadly, lf's copy seems to be essentially chunk-by-chunk read from source and write to destination type of copy, which is the most universal solution if you want progress percentage, but doesn't honor those capabilities.
But guess what? io.Copy does!
This is awesome:
/mnt/remotecow ❯ cp --reflink=always big.mkv big-2.mkv # it's instant
/mnt/remotecow ❯ iocopy big.mkv big-iocopy.mkv # also instant
/mnt/remotecow ❯ cp --reflink=never big.mkv big-nocow.mkv # ... it isn't
/mnt/remotecow took 1m37s ❯
I've looked at Go's source, here is what I've found:
io.Copy
actually just calls io.copyBuffer
with nil
buffer, which tries to use src.(WriterTo)
and dst.(ReaderFrom)
when possible, and falling back to reading to and writing from buffer when not.
This is probably how CoW happened to work. I haven't found where exactly it is implemented, but digging around go's source - it seems to support copy_file_range()
, which in turn, from its man page:
...
copy_file_range() gives filesystems an opportunity to implement "copy accelera‐
tion" techniques, such as the use of reflinks (i.e., two or more inodes that
share pointers to the same copy-on-write disk blocks) or server-side-copy (in
the case of NFS).
...
I don't know how to do this yet, but there might be a way to pass to io.CopyBuffer
some custom buffer that will count bytes copied to it and report progress after each .Write()
, leading to:
io.CopyBuffer
with it's CoW capabilities (as shown above)You don't even have to make it permanent set reflink auto
, you can force io.CopyBuffer
to always use a buffer like shown here.
Okay, found a problem with that idea. Here is my crude test implementation that failed miserably because of it.
You see, os.File
always implements ReadFrom
(ReaderFrom
interface). It will try to use copy_file_range
internally, and will fallback to normal copy if failed.
All of that without telling you, what it actually did :laughing:
Figured it out the hard way.
So you see - it is too smart for its own good.
@jantatje @MahouShoujoMivutilde This sounds cool but it is also a little too specific. I don't think ext4 supports CoW which is still likely the most common filesystem on linux. Any PR for this should also work in other platforms without an issue.
@neeshy Hi! Why did you remove my CoW paste script?
Yes, cp
has --reflink=auto
. Of course. In fact, go's io.Copy()
(which is used for lf copy) can use CoW and even server side copy too, and acts pretty much like cp --reflink=auto
when by itself (not in lf, where copy is always buffered for progress). If you wanted to e.g. patch in native CoW support, this was never the issue - the progress while keeping the CoW support, all in one, on other hand, that's what's hard, because there isn't some convenient api to check if something isReflinkable(src, dst)
. Anyway.
The point of the script is that it blends reflinks with lf's native paste, thus keeping progress. It doesn't use auto
because we can't tell when reflink is possible and cp
will be instantaneous or when will it silently fall back to the regular copy, thus resulting in potentially minutes long or longer operation with no indication of progress for the user to see (which I believe would be undesirable for the most people).
The other paste replacement with rsync and progress doesn't even support CoW.
So, in short, I think my script would be still useful to have in Tips.
Because, again, if you use copy-on-write file system you would know about cp --reflink=auto
already. That you can map it to some key is obvious. But how can you seamlessly combine lf's paste progress for when CoW is not available and cp's reflinks is, I think, less so.
Also, I updated the script with better detection of the same source file system, to support reflinks between subvolumes.
@MahouShoujoMivutilde
Oh, sorry. I don't remember exactly, but I guess I didn't understand it's purpose in light of the default behavior of cp
. You (or anyone else) are free to add it back; it's publicly editable of course.
Looking at it now, it seems like it could be simplified quite a bit. What do you think of this version?
cmd paste-cow &{{
set -- $(cat ~/.local/share/lf/files)
mode="$1"
shift
case "$mode" in
copy)
first="true"
for file in "$@"; do
name="$(basename -- "$file")"
orig="$name"
count="0"
while [ -e "$name" ]; do
name="$orig.~$((count += 1))~"
done
if ! out="$(cp -an --reflink=always -- "$file" "./$name" 2>&1)"; then
if [ -n "$first" ]; then
# It's only safe to fallback to lf's own paste on the first iteration
lf -remote "send $id paste"
elif [ -n "$out" ]; then
lf -remote "send $id echoerr \"$(printf '%s' "$out" | sed 's/\\/\\\\/g;s/"/\\"/g')\""
else
lf -remote "send $id echoerr cp: failed"
fi
exit
fi
first=""
done
lf -remote "send $id echo \"\\033[0;32mCopied successfully (reflinked)\\033[0m\""
;;
move) lf -remote "send $id paste";;
esac
}}
This version has the advantage of using CoW whenever cp
is capable of it, instead of hardcoding cases.
EDIT: I went ahead and added it.
It would be nice to have an option to enable CoW for copy operations, as of version 9.0 cp from GNU coreutils does this by default. Reflinking both saves space and makes copying on the same filesystem faster. There should be an option to turn this on or off, if enabled lf should first try to reflink the file and if that fails fall back to copying normally. In my opinion it makes most sense to default this setting to true.