OpenDRR / opendrr-api

REST API for OpenDRR data / API REST pour les données OpenDRR
MIT License
4 stars 7 forks source link

Fetch datasets as release assets (instead of Git LFS pull) #190

Open anthonyfok opened 2 years ago

anthonyfok commented 2 years ago

Large datasets, mostly CSV files, are currently fetched directly from Git LFS which induce significant Git LFS bandwidth costs.

Fetching these datasets as pre-compressed release assets will reduce download time and eliminate most GitHub Git LFS bandwidth costs. Thanks to @jvanulde for the idea and @DamonU2 for the pioneering work.

This, I think, is easier to implement and maintain, thus more robust and less error-prone than my previous unimplemented "XZ-compressed copies of repos" idea:

Data source repos:

Scripts that fetch from these repos include (but may not be limited to):

Cf. these commands found in add_data.sh, for example:

fetch_csv openquake-inputs ...
fetch_csv model-inputs ...
curl -L https://api.github.com/repos/OpenDRR/canada-srm2/contents/cDamage/output?ref=tieg_natmodel2021
curl -L https://api.github.com/repos/OpenDRR/earthquake-scenarios/contents/FINISHED
python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=$DSRA_REPOSITORY --columnsINI=DSRA_outputs2postgres.ini --eqScenario="$eqscenario"

XZ or Zstd compression? (compressed file sizes vs. decompression speed)