Closed srinify closed 11 months ago
We still intend to upload the data to Hugging Face, however we will use a separate repository for the code and for the data since even with lfs having data in the same repo makes having a full copy of the repo unwieldy (especially with our 40 TB dataset).
Hey folks! I work at XetHub and we scale Git to handle large files. We recently launched a GitHub integration that brings this into GitHub repos too and it's free forever for public repos.
As an example, we brought 100+ GB of onxx model files into this repo: https://github.com/xetdata/onnx-models
After people have installed our tiny Git extension, whenever they clone from this repo all large files are downloaded. They can optionally ignore the large files as well.
If that sounds interesting, I'd love to collaborate to make this happen.