alan-turing-institute / uatk-spc

Synthetic Population Catalyst
https://alan-turing-institute.github.io/uatk-spc/
MIT License
20 stars 12 forks source link

More robust download manager #30

Open dabreegster opened 2 years ago

dabreegster commented 2 years ago

Detect partially downloaded or unzipped files, and retry. If Azure supports HTTP HEAD, we can check file sizes somewhat cheaply. It'd still not be ideal to hit the network unnecessarily -- so maybe we use data/manifest.csv as a local source of truth, and compare byte count or md5sum.

dabreegster commented 2 years ago

We could also consider retry policies for things like

Error: error sending request for url (https://ramp0storage.blob.core.windows.net/countydata/tus_hse_durham.gz): error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution