gokcehan / lf

Terminal file manager
MIT License
7.76k stars 332 forks source link

Reflink/Copy-on-Write support #711

Open jantatje opened 3 years ago

jantatje commented 3 years ago

It would be nice to have an option to enable CoW for copy operations, as of version 9.0 cp from GNU coreutils does this by default. Reflinking both saves space and makes copying on the same filesystem faster. There should be an option to turn this on or off, if enabled lf should first try to reflink the file and if that fails fall back to copying normally. In my opinion it makes most sense to default this setting to true.

MahouShoujoMivutilde commented 2 years ago

I agree.

Also, I would like to add that while it is possible to implement on the user side (by redefining paste function in lfrc, as shown in man page), you'll loose the benefit of progress reporting in status bar and handling of duplicate names (.~1~ suffix).

Old buggy code My very crude implementation: ```sh cmd paste &{{ set -- $(cat ~/.local/share/lf/files) mode="$1" shift case "$mode" in copy) # FIXME this does not work for directories if cp -rn --reflink=always -- "$@" .; then lf -remote "send $id echo reflinked!" else # copy lefts empty file on reflink fail if [ "$PWD/$(basename "$@")" != "$@" ]; then # OCD lmao rm -rf "$PWD/$(basename "$@")" fi lf -remote "send $id paste" fi ;; move) mv -n -- "$@" . # FIXME # unlike with copy, normal paste doesn't work here for some reason # i must be missing something # lf -remote "send $id paste" ;; esac rm ~/.local/share/lf/files lf -remote "send clear" }} ``` Maybe someone can come up with something more robust and better, but overall - i think this should be native feature, CoW fss has been around for a long time, it would be nice to take advantage of their features in more software.

UPD Nov 12 2021

I've made a much better workaround. @jantatje, you might find it useful.

Old, but better code Previous implementation above had a number of problems, but this one: 1. Tries to use CoW on btrfs 2. Falls back to lf's native paste if it can't 3. Handles matching names in destination with `.~1~` like lf 4. Forwards cp errors to status line, if any 5. No `rm` anywhere for a peace of mind ```bash cmd paste_try_cow &{{ # # This was very helpful for debugging: # log_file="$HOME/lf-reflink-log-$(date +'%Y-%m-%d_%H-%M-%S')" # [ -f "$log_file" ] || touch "$log_file" # exec 1>> $log_file 2>&1 # set -x # In theory, this may fail, # but I tested it on selection with 10k files - everything worked (bash) set -- $(cat ~/.local/share/lf/files) mode="$1" shift if [ $mode = 'copy' ]; then # Reflink if first item of selection and the destination are on the # same mount point and it is btrfs. # (to make sure reflink never fails in first place, so we don't have to # clean up) if [ "$(df "$PWD" --output=target | tail -n 1)" = \ "$(df "$@" --output=target | tail -n 1)" ] && \ [ "$(df --output=fstype "$PWD" | tail -n 1)" = btrfs ]; then echo 'selected copy and cp reflink paste' # # Handle same names in dst # # TODO make this run in parallel, idk # This is simple, but slow for i in "$@"; do name="${i##*/}" original="$name" count=0 while [ -w "$PWD/$name" ]; do count=$((count+1)) name="$original.~$count~" done set +e cp_out="$(cp -rn --reflink=always -- "$i" "$PWD/$name" 2>&1)" set -e if [ ! -z "$cp_out" ]; then lf -remote "send $id echoerr $cp_out" exit 0 fi done # Or just skip a file when names are the same. # (A LOT faster if you e.g. pasting selection of 10k files) # cp -rn --reflink=always -- "$@" . lf -remote "send clear" lf -remote "send $id echo reflinked!" else echo 'selected copy and lf native paste' lf -remote "send $id paste" fi elif [ $mode = 'move' ]; then echo 'selected move and lf native paste' lf -remote "send $id paste" fi # # for debug # set +x lf -remote "send load" }} # name is different to avoid recursive calls map p paste_try_cow ```

UPD 3 Dec 2021

Added some cosmetic changes.

Latest version lives now in Wiki/Tips section. Apparently gone??

UPD 9 Oct 2024

❗Latest version❗

This snippet: 1. Tries to use CoW (reflinks) on btrfs, zfs and xfs 2. Falls back to lf's native paste **(keeps progress %)** if it can't 3. Handles matching names in destination with `.~1~` like lf 4. Forwards cp errors to status line, if any ...Now supports reflinks between subvolumes on same source drive. (Read before using) ```sh cmd paste_try_cow &{{ # # This was very helpful for debugging: # log_file="/tmp/lf-$id-paste_try_cow-log-$(date +'%Y-%m-%d_%H-%M-%S')" # [ -f "$log_file" ] || touch "$log_file" # exec 1>> $log_file 2>&1 # set -x # In theory, this may fail, # but I tested it on selection with 10k files - everything worked (bash) set -- $(cat ~/.local/share/lf/files) mode="$1" shift if [ $mode = 'copy' ]; then # Reflink if all items of selection and the destination are on the # same source device and it is CoW fs. # (to make sure reflink never fails in first place, so we don't have to # clean up) # Now with reflink support between different subvolumes of the same btrfs # Findmnt can only query one target at a time. # Thus, complex cases (e.g. one source item on btrfs on /dev/sdb1, # another source item on ext4 on /dev/sdb2) are not supported, # and will fallback to lf's paste. single_src=false if [[ "$(df --output=target -- "$@" | sed '1d' | sort -u | wc -l)" == 1 ]]; then single_src=true fi # We want: same device for all srcs and dst, with fs that support reflinks if [[ "$(findmnt --noheadings --list --nofsroot --real -o source --target "$PWD")" == \ "$(findmnt --noheadings --list --nofsroot --real -o source --target "$1")" ]] && \ $single_src && \ [[ "$(findmnt --noheadings --list --nofsroot --real -o fstype --target "$PWD")" =~ ^(btrfs|xfs|zfs)$ ]]; then echo 'selected copy and cp reflink paste' start=$(date '+%s') # Handle same names in dst # TODO: parallelism, idk - but exit/return/break won't stop the loop from subshell... for i in "$@"; do name="${i##*/}" original="$name" count=0 while [ -w "$PWD/$name" ]; do count=$((count+1)) name="$original.~$count~" done set +e cp_out="$(cp -rn --preserve=all --reflink=always -- "$i" "$PWD/$name" 2>&1)" set -e if [ ! -z "$cp_out" ]; then lf -remote "send $id echoerr $cp_out" exit 0 fi done finish=$(( $(date '+%s') - $start )) t='' if (( $finish > 2 )); then t="${finish}s" fi # Or just skip a file when names are the same. # (A LOT faster if you e.g. pasting selection of 10k files) # cp -rn --reflink=always -- "$@" . # lf -remote "send clear" green=$'\u001b[32m' reset=$'\u001b[0m' lf -remote "send $id echo ${green}reflinked!${reset} $t" else echo 'selected copy and lf native paste' lf -remote "send $id paste" # lf -remote "send clear" fi elif [ $mode = 'move' ]; then echo 'selected move and lf native paste' lf -remote "send $id paste" # lf -remote "send clear" fi # # for debug # set +x lf -remote "send load" }} # name is different to avoid recursive calls map p paste_try_cow ```

MahouShoujoMivutilde commented 2 years ago

OMG, io.Copy actually seems to support CoW!

How do I know?


~/tmp ❯ time iocopy big.mkv big-iocopy-new.mkv
iocopy big.mkv big-iocopy-new.mkv  0.00s user 0.06s system 81% cpu 0.072 total

~/tmp ❯ time cp --reflink=always big.mkv big-cp-ref.mkv
cp -ir --reflink=always big.mkv big-cp-ref.mkv  0.00s user 0.02s system 96% cpu 0.024 total

~/tmp ❯ time cp --reflink=never big.mkv big-noref.mkv
cp -ir --reflink=never big.mkv big-noref.mkv  0.00s user 4.17s system 67% cpu 6.148 total

6 seconds for non-CoW cp vs less than 100ms for io.Copy and cp with CoW. WOW!

This is with go version go1.17.3 linux/amd64 on btrfs (kernel 5.15.3).

iocopy code ```go package main import ( "flag" "io" "os" ) func main() { flag.Parse() if flag.NArg() != 2 { panic("usage: iocopy ") } src := flag.Arg(0) dst := flag.Arg(1) fin, err := os.Open(src) if err != nil { panic(err) } defer fin.Close() fout, err := os.Create(dst) if err != nil { panic(err) } defer fout.Close() _, err = io.Copy(fout, fin) if err != nil { panic(err) } } ```

Both files are actually seem to refer to the same extents:

filefrag check ```sh ~/tmp ❯ iocopy big.mkv iocopy.mkv ~/tmp ❯ filefrag -v big.mkv > big.txt ~/tmp ❯ filefrag -v iocopy.mkv > iocopy.txt ~/tmp ❯ diff -u big.txt iocopy.txt ``` ```diff --- big.txt 2021-11-22 21:11:49.128978402 +0300 +++ iocopy.txt 2021-11-22 21:11:57.737916530 +0300 @@ -1,5 +1,5 @@ Filesystem type is: 9123683e -File size of big.mkv is 4474504665 (1092409 blocks of 4096 bytes) +File size of iocopy.mkv is 4474504665 (1092409 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 31: 133900158.. 133900189: 32: encoded,shared 1: 32.. 127: 133908099.. 133908194: 96: 133900190: shared @@ -507,4 +507,4 @@ 503: 505622.. 767731: 147864832.. 148126941: 262110: 147864714: shared 504: 767732.. 1092389: 148126976.. 148451633: 324658: 148126942: shared 505: 1092390.. 1092408: 119582017.. 119582035: 19: 148451634: last,encoded,shared,eof -big.mkv: 506 extents found +iocopy.mkv: 506 extents found ``` Only names are different. For reference: ```sh ~/tmp ❯ filefrag -v big-nocow.mkv > big-nocow.txt ~/tmp ❯ wc -l big.txt iocopy.txt big-nocow.txt # quite different 510 big.txt 510 iocopy.txt 16 big-nocow.txt 1036 total ```

Also, this a bit different, but still related, so I'm going to comment on it here.

TIL: cifs and samba 4.1.0+ actually support server-side copy and even remote CoW (latter if share is on btrfs).

Sadly, lf's copy seems to be essentially chunk-by-chunk read from source and write to destination type of copy, which is the most universal solution if you want progress percentage, but doesn't honor those capabilities.

But guess what? io.Copy does!

This is awesome:

/mnt/remotecow ❯ cp --reflink=always big.mkv big-2.mkv # it's instant

/mnt/remotecow ❯ iocopy big.mkv big-iocopy.mkv # also instant

/mnt/remotecow ❯ cp --reflink=never big.mkv big-nocow.mkv # ... it isn't

/mnt/remotecow took 1m37s ❯

UPD

I've looked at Go's source, here is what I've found:

io.Copy actually just calls io.copyBuffer with nil buffer, which tries to use src.(WriterTo) and dst.(ReaderFrom) when possible, and falling back to reading to and writing from buffer when not.

This is probably how CoW happened to work. I haven't found where exactly it is implemented, but digging around go's source - it seems to support copy_file_range(), which in turn, from its man page:

       ...
       copy_file_range() gives filesystems an opportunity to implement "copy  accelera‐
       tion"  techniques,  such  as  the use of reflinks (i.e., two or more inodes that
       share pointers to the same copy-on-write disk blocks)  or  server-side-copy  (in
       the case of NFS).
       ...

I don't know how to do this yet, but there might be a way to pass to io.CopyBuffer some custom buffer that will count bytes copied to it and report progress after each .Write(), leading to:

  1. keeping progress counter, as it is right now
  2. using io.CopyBuffer with it's CoW capabilities (as shown above)

You don't even have to make it permanent set reflink auto, you can force io.CopyBuffer to always use a buffer like shown here.


UPD 26 Nov

Okay, found a problem with that idea. Here is my crude test implementation that failed miserably because of it.

You see, os.File always implements ReadFrom (ReaderFrom interface). It will try to use copy_file_range internally, and will fallback to normal copy if failed.

All of that without telling you, what it actually did :laughing:

Figured it out the hard way.

So you see - it is too smart for its own good.

gokcehan commented 2 years ago

@jantatje @MahouShoujoMivutilde This sounds cool but it is also a little too specific. I don't think ext4 supports CoW which is still likely the most common filesystem on linux. Any PR for this should also work in other platforms without an issue.

MahouShoujoMivutilde commented 4 weeks ago

@neeshy Hi! Why did you remove my CoW paste script?

Yes, cp has --reflink=auto. Of course. In fact, go's io.Copy() (which is used for lf copy) can use CoW and even server side copy too, and acts pretty much like cp --reflink=auto when by itself (not in lf, where copy is always buffered for progress). If you wanted to e.g. patch in native CoW support, this was never the issue - the progress while keeping the CoW support, all in one, on other hand, that's what's hard, because there isn't some convenient api to check if something isReflinkable(src, dst). Anyway.

The point of the script is that it blends reflinks with lf's native paste, thus keeping progress. It doesn't use auto because we can't tell when reflink is possible and cp will be instantaneous or when will it silently fall back to the regular copy, thus resulting in potentially minutes long or longer operation with no indication of progress for the user to see (which I believe would be undesirable for the most people).

The other paste replacement with rsync and progress doesn't even support CoW.

So, in short, I think my script would be still useful to have in Tips.

Because, again, if you use copy-on-write file system you would know about cp --reflink=auto already. That you can map it to some key is obvious. But how can you seamlessly combine lf's paste progress for when CoW is not available and cp's reflinks is, I think, less so.


Also, I updated the script with better detection of the same source file system, to support reflinks between subvolumes.

neeshy commented 6 days ago

@MahouShoujoMivutilde

Oh, sorry. I don't remember exactly, but I guess I didn't understand it's purpose in light of the default behavior of cp. You (or anyone else) are free to add it back; it's publicly editable of course.

Looking at it now, it seems like it could be simplified quite a bit. What do you think of this version?

cmd paste-cow &{{
    set -- $(cat ~/.local/share/lf/files)
    mode="$1"
    shift
    case "$mode" in
        copy)
            first="true"
            for file in "$@"; do
                name="$(basename -- "$file")"
                orig="$name"

                count="0"
                while [ -e "$name" ]; do
                    name="$orig.~$((count += 1))~"
                done

                if ! out="$(cp -an --reflink=always -- "$file" "./$name" 2>&1)"; then
                    if [ -n "$first" ]; then
                        # It's only safe to fallback to lf's own paste on the first iteration
                        lf -remote "send $id paste"
                    elif [ -n "$out" ]; then
                        lf -remote "send $id echoerr \"$(printf '%s' "$out" | sed 's/\\/\\\\/g;s/"/\\"/g')\""
                    else
                        lf -remote "send $id echoerr cp: failed"
                    fi
                    exit
                fi
                first=""
            done
            lf -remote "send $id echo \"\\033[0;32mCopied successfully (reflinked)\\033[0m\""
            ;;
        move) lf -remote "send $id paste";;
    esac
}}

This version has the advantage of using CoW whenever cp is capable of it, instead of hardcoding cases.

EDIT: I went ahead and added it.