Cache Census LODES data on S3. Only fetch files directly from the Census Bureau website if they aren't in the S3 cache, and then upload them to the S3 cache.
This also adds a convenience script ./scripts/run-lodes-cache-update to populate the S3 cache for all available states and both data types.
Demo
Running ./scripts/run-lodes-cache-update the first time:
Running ./scripts/run-lodes-cache-update again (with a populated cache):
Analysis results, with employment data set:
Notes
I could run the cache-populating script for the staging and/or production environments, if someone could point me to the appropriate S3 bucket locations for those environments.
Previously, the import script would check S3 first before downloading LODES data from the Census Bureau site but not upload to it to S3 if it wasn't there, which makes me think that a one-off script had been run at some point to populate the S3 cache, but that script was not run again when the LODES years in use got updated. I didn't find any pre-existing script in the repository to do that.
I ran into #808 while working on this.
Testing Instructions
Set PFB_DEBUG in the VM if you'd like to see full debug output from the script
Run ./scripts/run-lodes-cache-update in the VM
On first run, it should not find cached LODES files in your personal dev environment S3 bucket
Kill the script after it finishes at least one upload
Re-run the script. Previously processed states should be on S3; others should download from Census and upload to S3
Run an analysis
Analysis results should still look as expected and contain an employment accessibility figure
Overview
Cache Census LODES data on S3. Only fetch files directly from the Census Bureau website if they aren't in the S3 cache, and then upload them to the S3 cache.
This also adds a convenience script
./scripts/run-lodes-cache-update
to populate the S3 cache for all available states and both data types.Demo
Running
./scripts/run-lodes-cache-update
the first time:Running
./scripts/run-lodes-cache-update
again (with a populated cache):Analysis results, with employment data set:
Notes
I could run the cache-populating script for the staging and/or production environments, if someone could point me to the appropriate S3 bucket locations for those environments.
Previously, the import script would check S3 first before downloading LODES data from the Census Bureau site but not upload to it to S3 if it wasn't there, which makes me think that a one-off script had been run at some point to populate the S3 cache, but that script was not run again when the LODES years in use got updated. I didn't find any pre-existing script in the repository to do that.
I ran into #808 while working on this.
Testing Instructions
PFB_DEBUG
in the VM if you'd like to see full debug output from the script./scripts/run-lodes-cache-update
in the VMChecklist
Resolves #786