StatisticalRethinkingJulia / StatisticalRethinking.jl

Julia package with selected functions in the R package `rethinking`. Used in the SR2... projects.
MIT License
385 stars 32 forks source link

'.git' folder is bloated with 1 Go of data ! #145

Closed samusz closed 2 years ago

samusz commented 2 years ago

My branch is up to date '

julia du -h StatisticalRethinking.jl 28K StatisticalRethinking.jl/.git/hooks 0 StatisticalRethinking.jl/.git/info 0 StatisticalRethinking.jl/.git/logs/refs/heads 0 StatisticalRethinking.jl/.git/logs/refs/remotes/origin 0 StatisticalRethinking.jl/.git/logs/refs/remotes 0 StatisticalRethinking.jl/.git/logs/refs 0 StatisticalRethinking.jl/.git/logs 0 StatisticalRethinking.jl/.git/objects/info 1.2G StatisticalRethinking.jl/.git/objects/pack 1.2G StatisticalRethinking.jl/.git/objects 0 StatisticalRethinking.jl/.git/refs/heads 0 StatisticalRethinking.jl/.git/refs/remotes/origin 0 StatisticalRethinking.jl/.git/refs/remotes 0 StatisticalRethinking.jl/.git/refs/tags 0 StatisticalRethinking.jl/.git/refs 1.2G StatisticalRethinking.jl/.git 0 StatisticalRethinking.jl/chapters/00 16K StatisticalRethinking.jl/chapters/02 16K StatisticalRethinking.jl/chapters/03 44K StatisticalRethinking.jl/chapters/04 8.0K StatisticalRethinking.jl/chapters/05 12K StatisticalRethinking.jl/chapters/09 96K StatisticalRethinking.jl/chapters 1.8M StatisticalRethinking.jl/data 16K StatisticalRethinking.jl/docs/src 24K StatisticalRethinking.jl/docs 140K StatisticalRethinking.jl/notebooks/00 1.5M StatisticalRethinking.jl/notebooks/02 3.1M StatisticalRethinking.jl/notebooks/03 12M StatisticalRethinking.jl/notebooks/04 960K StatisticalRethinking.jl/notebooks/05 128K StatisticalRethinking.jl/notebooks/09 18M StatisticalRethinking.jl/notebooks 0 StatisticalRethinking.jl/scripts/00 16K StatisticalRethinking.jl/scripts/02 20K StatisticalRethinking.jl/scripts/03 96K StatisticalRethinking.jl/scripts/04 8.0K StatisticalRethinking.jl/scripts/05 16K StatisticalRethinking.jl/scripts/09 156K StatisticalRethinking.jl/scripts 0 StatisticalRethinking.jl/src/quaps 16K StatisticalRethinking.jl/src 16K StatisticalRethinking.jl/test 1.2G StatisticalRethinking.jl /0.7s ➜ julia cd StatisticalRethinking.jl/'

So for about 4 Mo of data the .git is 1 Go Some branches needs squashing ?

goedman commented 2 years ago

Thanks for pointing this out and I see that as well. What I don't know is how to safely fix it.

Maybe git rebase --autosquash master?

goedman commented 2 years ago

The bloating is due to a bunch of (pretty) old .ipynb files which I used in version 1. Why specifically 8 or 9 versions of one notebook and one or 2 other notebooks are still present is unclear to me.

I've tried in many ways to remove these files and/or shorten the history to e.g. only contain v4+ related files, but to no avail.

Not sure what other options are available. At least one suggestion I came across is deleting the Github repo, create a new repo with the same name and commit the contents of the current master.

samusz commented 2 years ago

I am not git nor gihub expert so can't really give safe advices. If I understand well what I've read, git squash, deleting branches or rebase could be a solution.

HTH

goedman commented 2 years ago

Thanks for your suggestions!

It might be difficult to go the route of rewriting history as Mosè pointed out. But nevertheless I'm glad you raised the issue! I pretty much have the package all the time in my dev directory so I hardly ever download it.

Anyway, what I think I'm going to do is to move the contents of StatisticalRethinking.jl to a new directory probably called StatisticalRethinkingBase.jl (or SRBase.jl) and use the current version more as a general intro to the StatisticalRethinking GitHub organization.

goedman commented 2 years ago

After asking for some feedback on Julia discourse I'll implement the above solution.

The large size issue only occurs if you dev the package, it does not happen if you add the package.

As soon as StatisticalRethinkingBase.jl is published I'll let you know and close this issue.

Thanks again for your help!

goedman commented 2 years ago

Hi @samusz ,

Given that I now have a better understanding of the (somewhat limited, only on initial dev-ing) impact of this issue I might hold off a little longer with creating StatisticalRethinkingBase.jl. Such a change will have consequences for many of the notebooks in the Turing and Stan Julia projects that use StatisticalRethinking.jl. But if it really becomes/remains a major issue for you or others, please ping (@goedman) me. For that reason (and in case others run into it), we should probably leave this issue open.