Closed Speykious closed 2 years ago
Yep, I'm fine to rewrite history here. Are you able to do that via a PR?
I'm not sure honestly... I could try though. I'll see what I can do.
I think you need write access to the main branch
So I tried to do it today, and it was actually rather easy to do with the BFG repo cleaner.
Here are the commands that I ran. I did that on a folder named taffy.git
, which is obtained from git clone --mirror git@github.com:DioxusLabs/taffy.git
.
# Delete the bindings/ folder from history where almost every big blob is
bfg --delete-folders bindings taffy.git
# Delete the only blob that was left outside of the bindings/ folder
bfg --delete-files selenium.jar taffy.git
# BFG only rewrites commits but doesn't actually remove the blobs, so we do this now
cd taffy.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Here's the end result:
git rev-list --objects --all |\
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |\
sed -n 's/^blob //p' |\
sort --numeric-sort --key=2 |\
cut -c 1-12,41- |\
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest |\
tail -n 10
e36ac7aba85b 455KiB tests/generated.rs
8d68c589925c 466KiB docs/yarn.lock
15e1acdeafdc 467KiB tests/generated.rs
03a18ea25baa 468KiB tests/generated.rs
d22058ee45fb 503KiB docs/yarn.lock
6d3cc77f162e 512KiB tests/generated.rs
05cb789092ab 512KiB tests/generated.rs
df89457fc5d6 516KiB tests/generated.rs
2a0491e1fd5d 516KiB tests/generated.rs
18ac5e513ef2 528KiB docs/yarn.lock
However, I think it would only be useful if I had write access. So I think it's best if a maintainer/owner does it. :)
In which case, I highly recommend doing experiments on a --mirror
clone (like I said at the beginning) so that you can make sure to not delete something that wasn't meant to be deleted.
Thank you very much for the detailed write-up and advice. I'm not a master of git, so it really will help me. @jkelleyrtp are you okay if I rewrite history to do this?
@Speykious thanks for figuring out the commands!
@alice-i-cecile yes please! We might need to temporarily turn off main branch protections - LMK if you need action from me there
Done, thanks for figuring out the way forward with this one :)
The git history contains large unneeded files. When cloning the repository, it copies about ~53 MiB of data, which is unusual for a Rust crate about layouting.
So I investigated the largest objects and found that there was previously a folder
bindings
in which a lot of binaries were inserted in the git history:Fixing this would require rewriting the git history, although I think it's worth it if that means people won't get these unused files.