Create XZ-compressed Git repos and download from them

OpenDRR / opendrr-api

REST API for OpenDRR data / API REST pour les données OpenDRR

MIT License

4 stars 7 forks source link

Tasks

[x] Compress repos and upload them to GitHub
[x] ~Edit the fetch_csv function~ Add new fetch_csv_xz function in OpenDRR/opendrr-api/python/add_data.sh to download from these compressed repos
[x] Rename the original fetch_csv function as fetch_csv_lfs
[x] New fetch_csv function to call fetch_csv_xz and fallback to fetch_csv_lfs
[ ] Deal with corner cases where add_data.sh needs to fetch historic CSV files that may no longer exists in HEAD
[ ] GitHub Actions for automatic verification and update

Description

Git LFS file download failure (Issue #90) might have been caused by we running out of our GitHub monthly bandwidth quota, especially with my frequent run of docker-composer up --build and docker-composer down -v in recent days.

Create compressed equivalents of LFS repos, e.g. model-inputs → ~model-inputs-gz or~ model-inputs-xz, etc. (2021-05-10 update: xz is chosen for its SHA-256 sum feature which matches oid sha256 entries in Git LFS pointer files.)

~Or perhaps use our B2 or S3 bucket? (populate manually or using GitHub Actions)~ ~Or can some kind of HTTP proxy be used? Anyway to use B2 or S3 for such a proxy?~ 2021-05-10 update: Downloading directly from https://raw.githubusercontent.com/ seems fast enough, so the use of buckets might not be necessary.

And what about local cache?

Corner cases

OpenDRR/opendrr-api/python/add_data.sh currently fetches some historic CSV files that may have already been deleted in HEAD. grep -B1 '?ref' opendrr-api/python/add_data.sh gives a list of them:

fetch_csv model-inputs \ exposure/census-ref-sauid/census-attributes-2016.csv?ref=ab1b2d58dcea80a960c079ad2aff337bc22487c5 -- fetch_csv model-inputs \ exposure/general-building-stock/documentation/collapse_probability.csv?ref=73d15ca7e48291ee98d8a8dd7fb49ae30548f34e -- fetch_csv model-inputs \ exposure/general-building-stock/documentation/retrofit_costs.csv?ref=73d15ca7e48291ee98d8a8dd7fb49ae30548f34e -- fetch_csv model-inputs \ natural-hazards/mh-intensity-ghsl.csv?ref=ab1b2d58dcea80a960c079ad2aff337bc22487c5

OpenDRR / opendrr-api