Morganamilo / paru

Feature packed AUR helper
GNU General Public License v3.0
6.21k stars 239 forks source link

Support git blobless clones #1027

Closed HaleTom closed 1 year ago

HaleTom commented 1 year ago

Git blobless clones only fetch the currently required objects - any older objects are fetched only if they are required.

Upsides:

The older and more active the project, the more savings.

Background reading: https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/

TL;DR:

git clone --filter=blob:none <url>

Solution options Perhaps this could be best achieved by adding a --gitcloneflags argument.

Some may want to pass --filter=tree:0 for a treeless clone or --depth=1 to get an even smaller shallow clone.

Non-options Using --gitflags won't work as this is applied globally to git, but --filter= is only valid for clone

Have you checked the readme and man page for this feature?
Yes

Have you checked previous issues for this feature? Yes

HaleTom commented 1 year ago

BTW, the argument that repos are small so it doesn't make much difference is true... except for when it isn't - and those can be very large.

My average repo size is 31MB:

% pwd
/home/ravi/.cache/aur
% for d in $(fd FETCH_HEAD); do du -sk "$(dirname "$d")/objects"; done
22044   tmux-git/tmux/objects
9488    libva-intel-driver-hybrid/intel-vaapi-driver/objects
4380    intel-hybrid-codec-driver-git/intel-hybrid-driver/objects
32112   zsync2-git/googletest/objects
2956    zsync2-git/args/objects
2312    zsync2-git/cpr/objects
532     zsync2-git/zsync2/objects
472     nvimpager-git/nvimpager/objects
96716   github-cli-git/cli/objects
44      libxxf86misc/libXxf86misc/objects
204     python-qroundprogressbar/QRoundProgressBar/objects
84      asus-wmi-screenpad-dkms-git/asus-wmi-screenpad-dkms-git/objects
1304    bees-git/bees/objects
313160  vcpkg-git/vcpkg-git/objects
1448    s3backer-git/s3backer-git/objects
% for d in $(fd FETCH_HEAD); do du -sk "$(dirname "$d")"; done | awk '{ sum += $1 } END { if (NR > 0) print sum / NR }'
31531.5
%
HaleTom commented 1 year ago

A hacky work-around:

[ -e "$HOME/bin/makepkg" ] && alias paru='paru --makepkg "$HOME"/bin/makepkg'
#!/bin/bash

# Override git clone for a non-full clone
# Source idea: https://github.com/Jguer/yay/issues/972#issuecomment-602309440

printf "%s: Local makepkg wrapper\n" "$0" >&2

# Override git to modify `clone` behaviour
git() {
    if [[ $# -gt 1 && $1 == 'clone' ]]; then
        echo "${0}: Awaiting Morganamilo/paru/issues/1104"
        printf "%s: Git clone with initial args: %s\n" "$0" "$(shell-quote "$@")"
        shift
        if [[ $1 == '-s' || $1 == '--shared' ]]; then
            # No space saving advantage with blobless clones and hardlinks
            /bin/git clone "$@"
        else
      printf "%s: Cloning blobless.\n" "$0" "$(shell-quote "$@")"
            /bin/git clone --filter=blob:none "$@"
        fi
    else
        /bin/git "$@"
    fi
}

source /bin/makepkg "$@"
stevenxxiu commented 1 year ago

Thanks for this! I didn't know about partial clones, only about shallow clones. Partial clones means versions will always be the same.

I updated my personal script to use the --makepkg flag and use a custom makepkg.sh. This reduced my biggest folder, ~/.cache/paru/clone/goldendict-ng-git/, from 252.4 MB to 27.6 MB.

The only issue is that partial clones appear to be very slow. Reinstalling this package took 15:15s, most of which was the cloning time. I suppose when it updates next time, it'll be faster.


Actually this doesn't always work when there's submodules. Installing brunsli with my wrapper I get:

...
error: unable to read sha1 file of test_exports.sh (2dbad7ab17bfaf8e0ed83364dccca7d676cbf072)
error: invalid object 100644 d878d20bf439a86eff55655e8a42e7801fdd6ff5 for 'c/highwayhash.c'
fatal: Unable to checkout '0aaf66bb8a1634ceee4b778df51a652bdf4e1f17' in submodule path 'third_party/highwayhash'
...
soloturn commented 1 year ago

there is a first implemenation in makepkg, part of pacman, working when source is in the default branch, working when the src directory is cleaned on every build. for paru, it should be ok. see also in the arch wiki: https://wiki.archlinux.org/title/makepkg.

merged already:

call then is

GITFLAGS="--filter=tree:0" paru 

if you want to try, and make improvement suggestions, or pull requests there?

stevenxxiu commented 1 year ago

I tried the makepkg patch. It's not quite working for me:

$ GITFLAGS="--filter=tree:0" paru --sync brunsli
...
==> Starting build()...
CMake Error: The source directory "/home/steven/.cache/paru/clone/brunsli/src/brunsli" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
...
$ GITFLAGS="--filter=blob:none" makepkg
...
==> Extracting sources...
  -> Creating working copy of brunsli git repo...
Cloning into 'brunsli'...
done.
error: unable to read sha1 file of .gitmodules (01572aa6092e04efa109cae28ea5086ee4ee69ad)
error: unable to read sha1 file of .travis.yml (0a22b9d86c2bf276eb25dc3a5dcf82ebc780d1b7)
error: unable to read sha1 file of BUILD (ccb1cc24b5735cbb6d88aa0e97bc4043605e83c8)
...
soloturn commented 1 year ago

oh, i see. this thing contains submodules. i updated the patch by putting in a --recurse-submodules flag. can you please give it another try? maybe also without GITFLAGS? especially when updating repositories and doing a makepkg again would be very helpful - as this is a different path in the git.sh of makepkg.

stevenxxiu commented 1 year ago

That appears to work, but it's not doing a parital clone.

I wonder if there's a way to do a partial clone of brunsli that fetches the required commits and builds, while being fast as well.

soloturn commented 1 year ago

@stevenxxiu , what repository is not cloned partially? a submodule? how fast you expect it to be? i am using it for https://aur.archlinux.org/packages/swift-language which is huge.

Morganamilo commented 1 year ago

Makepkg feature so out of scope for paru.

HaleTom commented 10 months ago

@soloturn there's an issue with your approach, detailed in:

warning: --filter is ignored in local clones; use file:// instead. #1104

soloturn commented 10 months ago

@HaleTom can you please register at pacman and comment there?

HaleTom commented 10 months ago

@HaleTom can you please register at pacman and comment there?

@soloturn I'm not sure what you mean exactly - which URL would I use for registering at pacman?

stevenxxiu commented 10 months ago

@stevenxxiu , what repository is not cloned partially? a submodule? how fast you expect it to be? i am using it for https://aur.archlinux.org/packages/swift-language which is huge.

I don't have this installed anymore, but https://aur.archlinux.org/packages/brunsli.

HaleTom commented 8 months ago

I've updated my work-around script above.

There's been an Arch MR opened for `--depth 1, with a hint to replace it with the blobless option

HaleTom commented 8 months ago

The Arch Wiki now gives an example of how to use GITFLAGS="--filter=tree:0" makepkg