Closed p3p closed 5 years ago
It is real good to know this. It gives us a number of options.
But my starting position is we keep everything until we are forced to prune the tree. (That doesn't mean we shouldn't fully discuss and debate this. My vote can be changed.)
Rewriting history will change all commit hashes, invalidating an awful lot of links that are floating around.
rewriting git history is bad
I could have perhaps been a little more specific with the details ^^, This was just an example of the tools available to reduce the download size mentioning the more extreme options at the end, the majority of the size reduction and general cleanup is achieved without having to resort to removing the 'large' unused blobs from history, just using git gc
.
According to my understanding, GitHub automatically does pruning and garbage collection on the repositories on their servers. Thus you cannot reach orphan commits. They are gone.
@thinkyhead, this isn't (can't be) true, as you can see from my post above running a normal git gc
on a mirror of the Marlin repository halved its size, there is a lot of redundant data.
There seems to be a way to add a GC button to settings. But I can't see how. https://github.com/gitbucket/gitbucket/issues/1212
As I said at the start of this I'm no git / github expert, I'm going to assume that github does not allow a mirror clone to be garbage collected then pushed back up (even though that doesn't change history) so I'l close this and hope that at some point githubs automated gc triggers .. even the git documentation recommends a git gc --aggressive
every few hundred commits .. Marlin goes through that in a couple of weeks.
BFG looks great. If only it could update commit IDs on the GitHub meta-data so all the old PR commit references would still work…. then we could use it on the main repo. I've been searching GitHub and so far haven't found any notes on how often they clean repos and how deeply.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Marlins git repository is getting a little on the beefy side, this is an example of how that could be mitigated if @thinkyhead thinks it is worth it. (I'm not a git expert by any means though)
This can be mitigated by using gits garbage collection tools:
This reduces the local repository by about 50% but we really need to be cleaning the remote so lets work on a mirror repo that when pushed will overwrite all references.
So the starting point is 171MiB for a repository mirror,
This simple solution gets us down to 90MiB in the mirror or 50MiB in a normal clone from remote.
But we can go a bit further a quick investigation of the git blobs shows:
So we have a few old large binaries hanging around in the repository, rewriting git history is bad, but this is upto @thinkyhead whether it is worth it, to clean these binaries from the repository all references to them need removed, to do that we will use bfg the argument choice is just to brute force remove all blobs above 1MiB but protect all current branches from modification so only deleted files will be removed from history.
So that gets the repository mirror down to 67MiB down from 171MiB, and a normal remote clone down to 43MiB. There are a few more binaries that could be removed but are protected by the old version branches (as seen in the bfg output) so this could be reduced further. (if you don't protect the 1.0.x branch 60MiB mirror and 35MiB remote clone)