eddelbuettel / bh

R package providing Boost Header files
85 stars 33 forks source link

Strip old tarballs from git history #25

Closed jimhester closed 8 years ago

jimhester commented 8 years ago

The old tarballs have been removed from the working tree but are still present in the history. They make the repo size much larger than it needs to be, as the tarballs can be downloaded from boost directly.

Stripping them from the history take the bare repo size (e.g. git clone --mirror) from ~350 MB to 16MB on my machine. This will make cloning this repo much faster!

You can do so by downloading the BFG repo cleaner and running the following commands.

git clone --mirror git@github.com:eddelbuettel/bh.git
java -jar bfg.jar --strip-blobs-bigger-than 10MB bh.git
cd bh.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --force
eddelbuettel commented 8 years ago

In favour! Ping me if I don't get to this in a few days.

eddelbuettel commented 8 years ago

Done. One minor correction was that '10m' rather than '10MB' is the size designator.

But something didn't work, see the log -- or is this expected?

edd@max:~/git/bh-new(master)$ git push --force
Counting objects: 19169, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6663/6663), done.
Writing objects: 100% (19169/19169), 14.67 MiB | 693.00 KiB/s, done.
Total 19169 (delta 12232), reused 19169 (delta 12232)
To git@github.com:eddelbuettel/bh.git
 + 4092d17...5b0588e master -> master (forced update)
 ! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/10/head -> refs/pull/10/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/14/head -> refs/pull/14/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/15/head -> refs/pull/15/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/2/head -> refs/pull/2/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/23/head -> refs/pull/23/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/24/head -> refs/pull/24/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/4/head -> refs/pull/4/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/6/head -> refs/pull/6/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/8/head -> refs/pull/8/head (deny updating a hidden ref)
error: failed to push some refs to 'git@github.com:eddelbuettel/bh.git'
edd@max:~/git/bh-new(master)$ 

The remote has received changes, a new checkout is now at 142mb -- as opposed to 660mb.

eddelbuettel commented 8 years ago

That plainly didn't work.

On another machine (the laptop) I updated and am now at 460mb instead of 445mb before, and got four warnings about files over 50mb. Which of course are no longer visible either. Fiddlesticks.

It is also messing with my history. What were three commits to bh on Sunday become six, and now nine.

And it messed with the git log. In 'graph mode' I now have an entire new 'line'.

eddelbuettel commented 8 years ago

Oh for fsck's sake. And GH now shows 288 commits instead of 144. That. Was. Not. A. Good. Idea.

eddelbuettel commented 8 years ago

I had left my main repo checkout untouched / unchanged. I just pushed a backup copy to GitLib just to be safe -- 144 commits, 305mb. As it should be.

Whereas this one is now busted at 288 commits.

Advice? If I nuke this at GH and re-create I loose issues and PRs.

jimhester commented 8 years ago

I think you need to re-clone the repo on other machines you can't just git pull from the previous clone. Did you push from the second machine as well? I think that is where your duplicate commits are committing from.

eddelbuettel commented 8 years ago

I think you need to re-clone the repo on other machines you can't just git pull from the previous clone.

I don;t understand sentence. If the old one was (is) ~/git/bh and I clone freshly into ~/git/bh-check then the latter does not know about the former.

The latter has git ls | wc -l result in 335 commits whereas the pristine copy has 166. The modified also has 166 but under sha1 values --- and once pushed and merged we get 2 x 166 + 1 = 335.

I would love to undo the 'wrong' 166 ones at the merge. Sadly THIS REPO now has all 335. How do I get rid of them without loosing history, issues, ... and other metadata I'd loose by deleting the whole repo?

jimhester commented 8 years ago

Add a new commit to the head of the clean repo and git push --force as Gabor suggested.

eddelbuettel commented 8 years ago

That seems to have done it! Thanks @jimhester and @gaborcsardi.

I'll close this issue now. I added a workaround to the README.md. Jim was actually the first person foolish^Hbrave enough to do a full PR -- everybody else who wanted a new Boost library just filed an issue. So I leave the monster size for now.

gaborcsardi commented 8 years ago

I guess you can try to get rid of the large files in the history in another branch, then you don't mess with master. If it will be successful this time, then you can just rename branches. This might mess up open PRs, but the rest should be OK I think.

eddelbuettel commented 8 years ago

I am still open to doing this but having been burned twice by such git filtering approaches, I would need a simpler / better / more reliable "script" to follow.

The goal is to keep this repo with issue ticket history, but filter master. I do not know how to do that. I seem to be able to filter a repo and push it somewhere else with an altered history, but that is not the goal.

gaborcsardi commented 8 years ago

I am not sure what you mean by 'history'. The commit hashes will change. There is simply no way of keeping them.

But this is fine, imo. You can keep the branch as oldmaster or something, so the hashes in issues and elsewhere still point to sg meaningful.

Forks and pull requests need to rebase or maybe even re-fork.

I'll give it a try, once I get to a better internet connection, I am on (slow) public wifi today.

eddelbuettel commented 8 years ago

'history' was a sloppy term. "Preserve as much as I can" from the existing repo -- as opposed to starting over with a fresh one with filtered code. That includes the history and sequence of commits, but then under different sha1 ids.

Maybe the branch switch is the element I was missing. But force pushing back into master I ended up with everything double.

gaborcsardi commented 8 years ago

No force pushing. Put the filtered repo in a new branch and then rename branches. You'll also need to rename locally or just reclone. On 27 Mar 2016 12:10, "Dirk Eddelbuettel" notifications@github.com wrote:

'history' was a sloppy term. "Preserve as much as I can" from the existing repo -- as opposed to starting over with a fresh one with filtered code. That includes the history and sequence of commits, but then under different sha1 ids.

Maybe the branch switch is the element I was missing. But force pushing back into master I ended up with everything double.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/eddelbuettel/bh/issues/25#issuecomment-202038726

eddelbuettel commented 7 years ago

Issue #34 with this contributed script did the trick. Many thanks to @Enchufa2 for providing it.

Enchufa2 commented 7 years ago

You're welcome. :-)