eddelbuettel / bh

R package providing Boost Header files
85 stars 33 forks source link

Reduce footprint #34

Closed Enchufa2 closed 7 years ago

Enchufa2 commented 7 years ago

I've prepared this script to remove and purge unwanted files from git history. You can use it to remove those big tar.gz as follows:

./git-remove.sh git@github.com:eddelbuettel/bh.git tar.gz

You'll be asked for confirmation before removing anything, and then, if everything went ok, changes will be automatically pushed. I've already tried it with a fork (check it) and it worked nicely (310.61 MiB -> 16.49 MiB).

eddelbuettel commented 7 years ago

I will give this a try. I looked into it once using the Java-based tool that is often recommended, but didn't like the outcome much (which is in a private repo on gitlab).

Your first key operation appears to be (and I am indenting here)

# Find the files you want to remove
FILE_LIST=$(git rev-list master | \
     while read rev; do git ls-tree -lr $rev  | \
     cut -c54- | sed -r 's/^ +//g;'; done  | \
     sort -u | perl -e 'while (<>) { 
                    chomp; \
                    @stuff=split("\t");$sums{$stuff[1]} += $stuff[0];} 
                    print "$sums{$_} $_\n" for (keys %sums);' | \
     sort -rn | grep $EXT)

and I see no test here. Does it first create a list one is then supposed to edit? We could easily have a condition here (ie drop files over 5mb each ...)

At the end:

"or to clone a fresh copy. And don't push that again! ;-)"

makes little sense. From the fresh copy one should be able to push, no?
"Just don't push from the old one" ?

eddelbuettel commented 7 years ago

Also reference #25 here

eddelbuettel commented 7 years ago

Now I feel silly -- I didn't see the EXT argument at first and its use. All clear now.

I will give this a try, maybe later today.

Enchufa2 commented 7 years ago

Now I feel silly -- I didn't see the EXT argument at first and its use. All clear now.

My fault for that huge one-liner. :-) Yes, basically, that line lists all the files, sorts them and filters by your given pattern. Then, FILE_LIST is printed and you'll be asked for confirmation. You'll see a lot of stuff going on (don't worry) and, if everything is ok (the script lists all files again to see if they were removed), you'll be prompted again to push all the changes.

"or to clone a fresh copy. And don't push that again! ;-)" makes little sense. From the fresh copy one should be able to push, no? "Just don't push from the old one" ?

Just a joke. ;-) I mean don't push the purged files again.

Enchufa2 commented 7 years ago

I've just indented those one-liners.

eddelbuettel commented 7 years ago

So I just tried with the 'backup' copy I had over at gitlab and it ends badly:

Files successfully removed! Let's push the changes...

Are you sure? [y/N] y

Counting objects: 21443, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (7144/7144), done.
Writing objects: 100% (21443/21443), 13.00 MiB | 701.00 KiB/s, done.
Total 21443 (delta 13986), reused 21443 (delta 13986)
remote: GitLab: You are not allowed to force push code to a protected branch on this project.
To git@gitlab.com:eddelbuettel/bh.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@gitlab.com:eddelbuettel/bh.git'
Everything up-to-date

Thoughts?

Edit: That was a gitlab thing. Unprotected the branch, and it proceeded. Very nice script -- thanks!

I now have 142mb in a fresh clone of the "pruned" repo and 822mb in the original.

Edit 2: And also 140-ish mb in the original repo once I remove the local (old) bh tarballs. All good.

Nice work --thanks again!