azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 10 forks source link

Cache downloaded census data files to S3 within analysis #812

Closed KlaasH closed 3 years ago

KlaasH commented 3 years ago

Overview

The analysis already looks on S3 for cached census data files (jobs and blocks), but there's no facility for actually filling that cache. This makes the analysis scripts that download and use those files also push them up to the S3 cache so they'll be there for the next analysis run. The files don't change (new versions would be released as new files, with a newer year in the filenames), so we don't need to worry about rotating or invalidating the cache.

This is in place of PR #809, which has the advantage of providing a way to load the cache with all the files for all states, but has the disadvantage of adding a lot more code--two Python scripts that are similar but not the same, plus two driver scripts. This achieves the same end as far as what the analysis does--using cached files if they're there and uploading them if they're not--within the existing bash scripts.

Just caching the files after first use will make a big difference to the rate of analysis failures from census.gov errors, but it would leave the door open for the first few runs in any given state to have problems. So there's value in preloading the cache. But not necessarily enough value to justify writing another script to do it. So I took what seemed like the path of least resistance, even though it's an odd road: I ran the scripts from PR #809 (scripts/run-blocks-cache-update and scripts/run-lodes-cache-update), which saved all the files to my development S3 bucket, then I downloaded them from there and uploaded them to the production bucket. So the production cache is loaded, but if we want to do it again someday we'll have to either check out the branch with those scripts (https://github.com/azavea/pfb-network-connectivity/tree/feature/kak/cache-census-s3%23786) or write a new script to do it.

Resolves #786

Testing Instructions

Checklist