epinowcast / epidist

An R package for estimating epidemiological delay distributions
http://epidist.epinowcast.org/
Other
11 stars 4 forks source link

Large Git repository size #105

Open jamesmbaazam opened 3 months ago

jamesmbaazam commented 3 months ago

I just tried to clone this repository and noticed it is extremely large (1.78 GiB). Could you consider reducing the size?

Enumerating objects: 36546, done.
Counting objects: 100% (36546/36546), done.
Delta compression using up to 8 threads
Compressing objects: 100% (14063/14063), done.
Writing objects: 100% (36546/36546), done.
Total 36546 (delta 6517), reused 36546 (delta 6517), pack-reused 0
count: 0
size: 0 bytes
in-pack: 36546
packs: 1
+size-pack: 1.78 GiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
athowes commented 3 months ago

Thanks @jamesmbaazam! I'll try to have a look now to see which are the large objects in the package. Let me know if you already know this.

sbfnk commented 3 months ago

This thread may be useful https://github.com/epiforecasts/EpiNow2/issues/538

athowes commented 3 months ago

Right yes, I was getting confused because I was just looking for large files and wasn't sure what it would be (most of the largest ones below in my local version aren't commited):

``` (base) adamhowes@Adams-MBP-3 epidist % find . -type f -exec du -h {} + | sort -rh | head -n 30 1.8G ./.git/objects/pack/pack-c0eadcec05734dbd7fbeea35b55b28e4d80163cf.pack 106M ./vignettes/approx-inference_cache/html/unnamed-chunk-3_2985e7a1c33b14dbee6c6ec2e582dd29.rdb 39M ./vignettes/epidist_cache/html/unnamed-chunk-10_5bc037bca68bb093cd52c5af19833258.rdb 39M ./vignettes/approx-inference_cache/html/unnamed-chunk-2_d897e4863777c9f2f93bedcef6242c9f.rdb 26M ./.Rproj.user/9EE3E129/ctx/ctx-11541/environment 4.1M ./.Rproj.user/9EE3E129/ctx/ctx-27423/environment 3.2M ./.Rproj.user/9EE3E129/ctx/ctx-27423/options 2.9M ./.git/objects/pack/pack-b817aae2005187439bb6ca20d19112f1e65bb811.pack 2.2M ./.git/objects/pack/pack-4612fe591aa5b4272815be1d44d4de620084dd3c.pack 1.8M ./.Rproj.user/9EE3E129/ctx/ctx-11541/options 1.3M ./doc/epidist.html 1.1M ./inst/gadm41_SLE_shp/gadm41_SLE_3.shp 964K ./.git/objects/pack/pack-c0eadcec05734dbd7fbeea35b55b28e4d80163cf.idx 780K ./inst/gadm41_SLE_shp/gadm41_SLE_2.shp 708K ./inst/gadm41_SLE_shp/gadm41_SLE_1.shp 684K ./inst/gadm41_SLE_shp/gadm41_SLE_0.shp 612K ./inst/README.html 496K ./data-raw/pnas.1518587113.sd02.xlsx 464K ./.git/objects/0b/d00b70191ce3449fb6e1cc9ace088f1dfc50c3 400K ./vignettes/approx-inference_cache/html/unnamed-chunk-5_f19631317140b7c91d2fad3fbc266f2b.rdb 396K ./vignettes/approx-inference_cache/html/unnamed-chunk-4_5ef2e049d3d4c08bf99ec1df29b55daa.rdb 384K ./docs/articles/epidist_files/bootstrap-3.3.5/css/bootstrap.css.map 328K ./docs/deps/bootstrap-5.3.1/bootstrap.bundle.min.js.map 316K ./.git/objects/53/6f70af7a33634640a582f839e43b6309e02d40 288K ./docs/deps/bootstrap-5.3.1/bootstrap.min.css 284K ./docs/deps/jquery-3.6.0/jquery-3.6.0.js 284K ./docs/articles/epidist_files/jquery-3.6.0/jquery-3.6.0.js 272K ./.Rproj.user/shared/notebooks/BFE9F241-epidist/1/s/cda40rp2ydjfw/000002.snapshot 260K ./vignettes/epidist_cache/html/unnamed-chunk-5_8603d489aac9e0331f40afed22f03136.rdb 208K ./vignettes/epidist_cache/html/unnamed-chunk-3_93116863c5d8a54d61bf388abb31230e.rdb ```

So the point is that the Git version contains all the histories too, which is ending up being large. Thanks for link @sbfnk, will read and try to implement.

Edit: after a skim it seems like doing this is relatively intricate. As there are not many people using the package currently, I wonder if doing this could be best timed to coincide with a 0.1.0 release. Otherwise, I think it may be that it would need to be done again at that point anyway (likely development will continue to add file histories). Let me know if people disagree about this and think it's a priority to do sooner.

jamesmbaazam commented 3 months ago

This thread may be useful epiforecasts/EpiNow2#538

Thanks, Seb. I was coming here to post this.

seabbs commented 3 months ago

(likely development will continue to add file histories).

It shouldn't add large files. We do want to close out hanging PRs before we do this (and in general).

Thanks for the input all. This is a legacy of pulling the package out of the analysis repo I think