Large datasets, mostly CSV files, are currently fetched directly from Git LFS which induce significant Git LFS bandwidth costs.
Fetching these datasets as pre-compressed release assets will reduce download time and eliminate most GitHub Git LFS bandwidth costs. Thanks to @jvanulde for the idea and @DamonU2 for the pioneering work.
This, I think, is easier to implement and maintain, thus more robust and less error-prone than my previous unimplemented "XZ-compressed copies of repos" idea:
OpenDRR/opendrr-api#91
Data source repos:
OpenDRR/openquake-inputs
OpenDRR/model-inputs
OpenDRR/canada-srm2
OpenDRR/earthquake-scenarios
Scripts that fetch from these repos include (but may not be limited to):
Large datasets, mostly CSV files, are currently fetched directly from Git LFS which induce significant Git LFS bandwidth costs.
Fetching these datasets as pre-compressed release assets will reduce download time and eliminate most GitHub Git LFS bandwidth costs. Thanks to @jvanulde for the idea and @DamonU2 for the pioneering work.
This, I think, is easier to implement and maintain, thus more robust and less error-prone than my previous unimplemented "XZ-compressed copies of repos" idea:
Data source repos:
Scripts that fetch from these repos include (but may not be limited to):
Cf. these commands found in add_data.sh, for example:
XZ or Zstd compression? (compressed file sizes vs. decompression speed)