OpenDRR / opendrr-api

REST API for OpenDRR data / API REST pour les données OpenDRR
MIT License
4 stars 7 forks source link

Create XZ-compressed Git repos and download from them #91

Open anthonyfok opened 3 years ago

anthonyfok commented 3 years ago

Tasks

Description

Git LFS file download failure (Issue #90) might have been caused by we running out of our GitHub monthly bandwidth quota, especially with my frequent run of docker-composer up --build and docker-composer down -v in recent days.

Create compressed equivalents of LFS repos, e.g. model-inputs → ~model-inputs-gz or~ model-inputs-xz, etc. (2021-05-10 update: xz is chosen for its SHA-256 sum feature which matches oid sha256 entries in Git LFS pointer files.)

~Or perhaps use our B2 or S3 bucket? (populate manually or using GitHub Actions)~ ~Or can some kind of HTTP proxy be used? Anyway to use B2 or S3 for such a proxy?~ 2021-05-10 update: Downloading directly from https://raw.githubusercontent.com/ seems fast enough, so the use of buckets might not be necessary.

And what about local cache?

anthonyfok commented 3 years ago

Notes

Repos to compress

(inside brackets are the rough time for compression with xz -9 on a gen-3 Intel Core i5.)

Quickly verifying checksum

xz -lvv <xz-file> | grep -Eo '[0-9a-z]{64}'

Corner cases

OpenDRR/opendrr-api/python/add_data.sh currently fetches some historic CSV files that may have already been deleted in HEAD. grep -B1 '?ref' opendrr-api/python/add_data.sh gives a list of them:

fetch_csv model-inputs \
  exposure/census-ref-sauid/census-attributes-2016.csv?ref=ab1b2d58dcea80a960c079ad2aff337bc22487c5
--
fetch_csv model-inputs \
  exposure/general-building-stock/documentation/collapse_probability.csv?ref=73d15ca7e48291ee98d8a8dd7fb49ae30548f34e
--
fetch_csv model-inputs \
  exposure/general-building-stock/documentation/retrofit_costs.csv?ref=73d15ca7e48291ee98d8a8dd7fb49ae30548f34e
--
fetch_csv model-inputs \
  natural-hazards/mh-intensity-ghsl.csv?ref=ab1b2d58dcea80a960c079ad2aff337bc22487c5