Git is not made for storing large binary files: On every change to a binary file, the entire file is stored again. When git clone, the entire history is downloaded, and with it all versions of the binary files.
Small files which are never changed are an acceptable overhead, like the current files in this repository, but I think VIP_extras will grow over time, e.g. the 4D IFS cube I have ready weights 22Mb (cropped).
alternatives
git-lfs
git-lfs is the "large file storage" for git, developed (and supported) by GitHub to address exactly that problem. Once git-lfs is set up for a repository (e.g. "track every .fits and .npz file"), one can use the regular git commands as before. Under the hood, the large files are not stored in the repository, but just their reference, while git uploads the large files to a special server.
advantages
regular git
as larges files are not part of the repo, the repo size does not increase with new/changed files
disadvantages
slightly more complicated setup (difficult to move existing files to lfs → rewrite history, etc.)
users need git-lfs installed to clone the repo
Binder does not seem to support lfs
bintray
advantages
free for open source, tightly integrated with GitHub (e.g. organizations)
simple to use (web interface for uploading, curl for downloading and astropy.utils.data.download_file for python)
keeps multiple file versions (like git or git-lfs)
disadvantages
none?
demo
I created a bintray project for VIP, and uploaded the IFS cube for testing.
problem
Git is not made for storing large binary files: On every change to a binary file, the entire file is stored again. When
git clone
, the entire history is downloaded, and with it all versions of the binary files.Small files which are never changed are an acceptable overhead, like the current files in this repository, but I think
VIP_extras
will grow over time, e.g. the 4D IFS cube I have ready weights 22Mb (cropped).alternatives
git-lfs
git-lfs is the "large file storage" for git, developed (and supported) by GitHub to address exactly that problem. Once git-lfs is set up for a repository (e.g. "track every .fits and .npz file"), one can use the regular git commands as before. Under the hood, the large files are not stored in the repository, but just their reference, while git uploads the large files to a special server.
advantages
disadvantages
bintray
advantages
curl
for downloading andastropy.utils.data.download_file
for python)disadvantages
demo
I created a bintray project for VIP, and uploaded the IFS cube for testing.
Take a look at the project site: https://bintray.com/r4lv/vip/data-cubes
Using the files in python would be