Open amotl opened 1 year ago
Using the steps outlined below, I've shrinked the repository and uploaded it to https://github.com/amotl/herbie-without-docs, in order to demonstrate it. Both download times, bandwidth-, and disk-usage will decrease significantly.
git clone --mirror https://github.com/blaylockbk/Herbie.git
cd Herbie.git
bfg --delete-folders _build --no-blob-protection .
git reflog expire --expire=now --all && git gc --prune=now --aggressive
time git clone https://github.com/blaylockbk/Herbie
real 0m21.635s
user 0m4.099s
sys 0m1.933s
du -sch Herbie
355M total
time git clone https://github.com/amotl/herbie-without-docs.git
real 0m9.420s
user 0m1.643s
sys 0m0.854s
du -sch herbie-without-docs/
84M total
I like the idea of cleaning up the unnecessary /_build
directory; sounds like if I keep doing the same, the repo will just keep getting bigger.
I found this tool that does the same thing (it's installable with conda and has docs that show how to convert a command from BRG) https://github.com/newren/git-filter-repo/
conda install -c conda-forge git-filter-repo
git clone https://github.com/blaylockbk/Herbie.git
cd Herbie
git filter-repo --invert-paths --path-glob '*/_build'
Before doing this, I'd like to better understand the implications of rewriting the git history. In your example, it looks like the tags and releases are lost. What else is lost?
And to be clear, instead of keeping the rendered docs in the Herbie repo, the rendered docs will be stored on readthedocs servers. So, I won't need to manually make html
; as long as the build works on readthedocs, they make the docs for each pull request and merge. Correct?
The rendered docs will be stored on readthedocs servers. They make the docs for each pull request and merge. Correct?
Correct!
Before doing this, I'd like to better understand the implications of rewriting the git history. In your example, it looks like the tags and releases are lost. What else is lost?
Oh, that might be the case. Well, it would be a bit sad, but there is probably no way around it. Maybe let's research this detail a bit more beforehand?
Hi again,
related to GH-140, because the rendered HTML documentation has been committed to the repository itself ^1, it weighs in with an unusual large repository size of 114 MB, making Git operations take more time and transfer bandwidth than necessary. While the matter would be resolved with GH-140, it does not shrink the repository retroactively.
So, while it does break eventual forks, I would strongly recommend to edit the repository history and remove this large chunk of content, by using a tool like BFG Repo-Cleaner.
If you agree on that, I can help you implementing the necessary steps.
With kind regards, Andreas.