azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 10 forks source link

Cache Census LODES data on S3 #809

Closed flibbertigibbet closed 3 years ago

flibbertigibbet commented 3 years ago

Overview

Cache Census LODES data on S3. Only fetch files directly from the Census Bureau website if they aren't in the S3 cache, and then upload them to the S3 cache.

This also adds a convenience script ./scripts/run-lodes-cache-update to populate the S3 cache for all available states and both data types.

Demo

Running ./scripts/run-lodes-cache-update the first time: pfb_cache_lodes_initial

Running ./scripts/run-lodes-cache-update again (with a populated cache): pfb_cache_lodes_secondary

Analysis results, with employment data set: pfb_reston_results_with_employment

Notes

I could run the cache-populating script for the staging and/or production environments, if someone could point me to the appropriate S3 bucket locations for those environments.

Previously, the import script would check S3 first before downloading LODES data from the Census Bureau site but not upload to it to S3 if it wasn't there, which makes me think that a one-off script had been run at some point to populate the S3 cache, but that script was not run again when the LODES years in use got updated. I didn't find any pre-existing script in the repository to do that.

I ran into #808 while working on this.

Testing Instructions

Checklist

Resolves #786

flibbertigibbet commented 3 years ago

I've updated this to add caching scripts for the Census block data as well.

KlaasH commented 3 years ago

Closing in favor of PR #812