Closed sbfnk closed 4 months ago
yes definitely agree. Certainly the main culprits (docs
, deps
, and src
). Agree we could remove prior figures from the old readme as well
Running
> git filter-repo \
--path src/ \
--path deps/ \
--path dev/ \
--path reference/ \
--path synthetic.rds \
--path data/example_regional_epinow.rda \
--path data/example_estimate_infections.rda \
--path-regex man/figures/unnamed-chunk-\[0-9\]+-1\\.png \
--path-regex inst/dev/figs/.\*scores\\.png \
--invert-paths
reduces the size of the repo from 1.1GB to 34MB. Any objections to going ahead with it? I could create a backup fork in my personal account first.
Given that this would require a force push anyone who has the repo checked out locally will have to do a git reset
at some point. I don't think there's a way around this - the alternative is to keep things as they are. On balance I'd think it's worth it but if anyone disagrees please leave a comment.
I'm not sure of the cons, so I'd say go ahead. It's good that you're keeping a backup just in case.
I agree this is necessary but highlighting some important caveats we discovered with @ntorresd when going through the same process with serofoi:
main
(at least for a brief moment in time). They can be opened at new PRs once you have force pushed all branches but ongoing conversations may be interrupted / you will have to start a new thread.@ntorresd, did I forget anything?
I would only add that you will not see the effects of the clean up until the clean versions of the git tags had been pushed. When we did this with @Bisaloo for serofoi we didn't see the change reflected on fresh copies of the repository until we ran git push origin v0.0.9 -f
on my local cleaned copy.
Thanks, I had forgotten about the tags.
I wonder about the impact of all of this on renv.lock
lockfiles since it stores a hash of the source :thinking:
Thanks all for the helpful comments. To confirm I will:
git filter-repo
command as abovefilter-repo/commit-map
git push --tags --force
which should address all the points raised above, unless I've forgotten something.
Yes, this seems right.
To be 100% clear because a previous version of my message wasn't: from what we've seen in serofoi, I don't think you'll be able to reopen closed PRs. You will have to create new ones. No issues from a git point of view, but conversation will be spread across two PRs.
Ah ok probably worth waiting for currently open ones to be merged then.
To do before 1.5 release
~I've done the steps outlined above and the force push succeeded - old refs are still there and PRs still open though, so not sure if I'm missing a step or if it's a matter of waiting for repacking.~ see next comment
Upon closer inspection the vast majority of the repo content was in the gh-pages
branch so I've done a big squash there has reduced the size to manageable levels (1.1 GB -> 100MB).
The repo has grown fairly large (~1 GB), but the files currently in the repo are only 11 MB in size. It might be nice, particularly towards those on low bandwidth connections or paying by volume, to look at reducing the size without losing any relevant development history.
Using
git filter-repo --analyze
reveals a few potential easy gains:At the very least this suggests to me that all the directories above, as well as all png files in
man/figures
(which, if I understand correctly, aren't used anywhere) could be purged. A line to exclude png files inman/figures
could also be added to.gitignore
. This could be followed by a deeper investigation of blob sizes for existing files.