azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 10 forks source link

Fix cleanup commands for non-US population files #905

Closed KlaasH closed 1 year ago

KlaasH commented 1 year ago

Overview

When downloading a population file from S3, the file gets copied to the temporary directory and needs to be cleaned up, but when using the default local file (/data/population.zip) it doesn't, so the rm command to remove it after extraction fails. Also, the command to rename the unzipped files to the default filename is necessary if they're not already named that (e.g. if the file is called population.zip but the files inside are called something else), but it crashes if they are. This changes both commands so they'll still do their thing but will be more tolerant of the files being missing or already named correctly.

Notes

This is an alternative to PR #902. I believe it addresses the issue raised there but preserves the cleanup that's needed when the population file is downloaded rather than loaded from the default local path.

Testing Instructions

rgreinho commented 1 year ago

I just tested your script, but there is still an issue:

+ echo 'DONE: Importing neighborhood_boundary'
+ '[' nonus == USA ']'
+ update_status IMPORTING 'Downloading census blocks'
+ echo 'Updating job status: IMPORTING' 'Downloading census blocks'
+ '[' -n '' ']'
+ '[' nonus == USA ']'
+ NB_BLOCK_FILENAME=population
+ S3_PATH=s3:///data/population.zip
+ '[' -f /data/population.zip ']'
+ echo 'Using local census blocks file'
+ BLOCK_DOWNLOAD=/data/population.zip
+ unzip /data/population.zip -d /tmp/tmp.gBCTxzkk8v/import_neighborhood
+ '[' nonus '!=' USA ']'
+ cd /tmp/tmp.gBCTxzkk8v/import_neighborhood
+ rm -f population.zip
+ for x in *
+ mv -u population.cpg population.cpg
mv: 'population.cpg' and 'population.cpg' are the same file
2022-08-23 20:59:46.788 UTC [14] LOG:  received fast shutdown request
2022-08-23 20:59:46.789 UTC [14] LOG:  aborting any active transactions
2022-08-23 20:59:46.796 UTC [14] LOG:  background worker "logical replication launcher" (PID 254) exited with exit code 1
2022-08-23 20:59:46.797 UTC [247] LOG:  shutting down
2022-08-23 20:59:46.867 UTC [14] LOG:  database system is shut down

You may want to use mv -f instead then.

KlaasH commented 1 year ago

Once again my test case wasn't as good as I thought it was--I was using population.zip, but still not a version with the files inside also named "population". When I try to mv -f a file to itself locally, I still get the "same file" error, but it looks like -n (no-clobber) will do the job. I pushed a fixup commit, and this time I definitely had a population.zip for testing that failed the old way but worked the new way, so hopefully this does the trick.

rgreinho commented 1 year ago

It worked like a charm 👍

+ update_status IMPORTING 'Downloading census blocks'
+ echo 'Updating job status: IMPORTING' 'Downloading census blocks'
+ '[' -n '' ']'
+ '[' nonus == USA ']'
+ NB_BLOCK_FILENAME=population
+ S3_PATH=s3:///data/population.zip
+ '[' -f /data/population.zip ']'
Using local census blocks file
+ echo 'Using local census blocks file'
+ BLOCK_DOWNLOAD=/data/population.zip
+ unzip /data/population.zip -d /tmp/tmp.TM85eALmZj/import_neighborhood
Archive:  /data/population.zip
 extracting: /tmp/tmp.TM85eALmZj/import_neighborhood/population.shx
 extracting: /tmp/tmp.TM85eALmZj/import_neighborhood/population.cpg
 extracting: /tmp/tmp.TM85eALmZj/import_neighborhood/population.shp
 extracting: /tmp/tmp.TM85eALmZj/import_neighborhood/population.dbf
 extracting: /tmp/tmp.TM85eALmZj/import_neighborhood/population.prj
+ '[' nonus '!=' USA ']'
+ cd /tmp/tmp.TM85eALmZj/import_neighborhood
+ rm -f population.zip
+ for x in *
+ mv -n population.cpg population.cpg
+ for x in *
+ mv -n population.dbf population.dbf
+ for x in *
+ mv -n population.prj population.prj
+ for x in *
+ mv -n population.shp population.shp
+ for x in *
+ mv -n population.shx population.shx
+ cd -
/opt/pfb/analysis/scripts
+ update_status IMPORTING 'Loading census blocks'