Closed ondrae closed 9 years ago
Have you tried the --single-branch
option for git clone? It’ll ignore anything other than the one branch you’re asking for.
I think this might be worth re-opening. At 18F, we'd like to be able to send pull requests to you, so we want to fork the repo via GitHub. However, by forking it, we assume the load of the current repo, and we'd prefer our repo not to be 43MB in size. The culprit is a 42MB .pack
file in .git/objects/pack
.
Looks like it's a bad pack file:
git verify-pack -v .git/objects/pack/pack-63968d00c2176e05298b52a129572aee5991630d.idx
fatal: Cannot open existing pack file '.git/objects/pack/pack-63968d00c2176e05298b52a129572aee5991630d.idx'
.git/objects/pack/pack-63968d00c2176e05298b52a129572aee5991630d.pack: bad
Nevermind. I pasted the wrong pack.
To see the 10 biggest files, run this:
git verify-pack -v .git/objects/pack/pack-e0dc5715594689368b1d28eeff86930591cc5d7f.idx \
| sort -k 3 -n \
| tail -10
To see what each file is, run this:
git rev-list --objects --all | grep [first few chars of the sha1 from previous output]
You will notice that all the files are either .gem
or .jar
. The next step would be to clean up your git by removing all of those unnecessary files.
One option is to use the bfg-repo-cleaner tool, which worked great for me, and was super fast.
Alternatively, you could do it manually following this git article, as outlined below:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.gem' -- --all
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc --aggressive --prune=now
Then repeat with .jar
files:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.jar' -- --all
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc --aggressive --prune=now
Then verify:
git count-objects -v
Your size-pack
should be a lot smaller now.
I should also note that the bfg-repo-cleaner tool will clean out more than just .gem
and .jar
files. If you use the command listed in their Usage section (java -jar bfg.jar --strip-biggest-blobs 500 some-big-repo.git
), it will clean out the 500 biggest files. When I looked through the log, there were a bunch of .yml
and .rb
files there as well, which are obviously not needed anymore.
Does Github actually count those 42MB? Is it unworkable for you to check out the single master branch?
Long term, we’d probably just trash the last-rails-version
release, which shares no history with the current master.
Isn't there only one branch, though? I vote for pruning the ruby stuff using @monfresh's approach.
@migurski Yes, when you clone this repo (or any fork of it), git downloads all 43MB of it.
This is not that big of deal since we have access to decent internet most of the time. For now, having to wait a little longer to clone the repo is better than having to use single branch mode IMO.
Consider this scenario:
--single-branch
option. This gives me the lean version.--single-branch
option, in addition to specifying the branch name she wants to work on. That can get annoying.I ran the two filter-branch
commands, and the size looks to be 3.41MB.
Sounds about right to me. Thanks!
I keep trying to clone it, and only getting to 87% on this cafe wifi. I had to download just the master branch as a .zip file without the history.
Since we've rewritten it anyways, can we separate it from the forked pivotal version?