freight-team / freight

A modern take on the Debian archive.
Other
107 stars 37 forks source link

Add option to avoid re-computing hashes #93

Closed runejuhl closed 3 years ago

runejuhl commented 5 years ago

Hi,

We have a repo that's slowly growing in size, and freight is starting to take a long time to run. By far most of the time seems to be spent here:

        # Finish the top-level `Release` file with references and
        # checksums for each sub-`Release` file and `Packages.gz` file.
        # In the future, `Sources` may find a place here, too.
        find "$DISTCACHE" -mindepth 2 -type f -printf %P\\n |
            grep -v ^\\. |
            while read -r FILE; do
                SIZE="$(apt_filesize "$DISTCACHE/$FILE")"
                echo " $(apt_md5 "$DISTCACHE/$FILE") $SIZE $FILE" >&3
                echo " $(apt_sha1 "$DISTCACHE/$FILE") $SIZE $FILE" >&4
                echo " $(apt_sha256 "$DISTCACHE/$FILE") $SIZE $FILE" >&5
                echo " $(apt_sha512 "$DISTCACHE/$FILE") $SIZE $FILE" >&6
            done 3>"$TMP/md5sums" 4>"$TMP/sha1sums" 5>"$TMP/sha256sums" 6>"$TMP/sha512sums"

Would it be acceptable to place a dot-file along with the repo files with the computed hashes, so that this piece can do a lookup in a file instead of computing the hash?

I've got a fork with a simple (and yet untested) implementation: https://github.com/freight-team/freight/compare/master...runejuhl:avoid-rehashing . If you feel like this is a good way of doing it I'll be glad to do some testing and submit a PR.

runejuhl commented 5 years ago

Is anybody listening? :smile:

KlavsKlavsen commented 5 years ago

it would be a very nice way of greatly improving performance.. in repos with just 50+ .deb's - it really starts to take time :( and createrepo for yum has had that for ages..

mmoll commented 5 years ago

@runejuhl Hi, sorry for the delay. :bowing_man: If you could do testing on your end and best also add BATS tests we would of course merge it. :+1:

runejuhl commented 5 years ago

@mmoll great, I'll find some time to do that and submit a proper PR