azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
39 stars 10 forks source link

Fix import script #902

Closed rgreinho closed 2 years ago

rgreinho commented 2 years ago

The image is built from azavea/pfb-network-connectivity@0037a5c, and the command I ran was the following:

docker run --rm \
    -e PFB_SHPFILE="/data/metz-lorraine.shp" \
    -e PFB_OSM_FILE="/data/metz-lorraine.osm" \
    -e PFB_COUNTRY=nonus \
    -e PFB_STATE=zz \
    -e PFB_STATE_FIPS=0 \
    -e NB_OUTPUT_DIR=/data \
    -e RUN_IMPORT_JOBS=0 \
    -e PFB_DEBUG=1 \
    -v "PeopleForBikes/brokenspoke-analyzer/data":/data \
    azavea/pfb-network-connectivity:0.17

The goal is to run an analysis for the city of Metz in France, located in the Lorraine region.

In analysis/import/import_neighborhood.sh the block attempting to do some cleanup is erroneous and can be removed.

If left, it crashes with the following error:

+ echo 'Updating job status: IMPORTING' 'Downloading census blocks'
+ '[' -n '' ']'
+ '[' nonus == USA ']'
+ NB_BLOCK_FILENAME=population
+ S3_PATH=s3:///data/population.zip
+ '[' -f /data/population.zip ']'
+ echo 'Using local census blocks file'
+ BLOCK_DOWNLOAD=/data/population.zip
+ unzip /data/population.zip -d /tmp/tmp.jAv86ZgUP8/import_neighborhood
+ '[' nonus '!=' USA ']'
+ cd /tmp/tmp.jAv86ZgUP8/import_neighborhood
+ rm population.zip
rm: cannot remove 'population.zip': No such file or directory

This is because the population.zip file was never moved to the tmpdir, only unzipped there, therefore there is nothing to remove.

This error could be avoided by switching to rm -f population.zip, but then the process fails with the following error:

+ echo 'Updating job status: IMPORTING' 'Downloading census blocks'
+ '[' -n '' ']'
+ '[' nonus == USA ']'
+ NB_BLOCK_FILENAME=population
+ S3_PATH=s3:///data/population.zip
+ '[' -f /data/population.zip ']'
+ echo 'Using local census blocks file'
+ BLOCK_DOWNLOAD=/data/population.zip
+ unzip /data/population.zip -d /tmp/tmp.MeseJ3QXln/import_neighborhood
+ '[' nonus '!=' USA ']'
+ cd /tmp/tmp.MeseJ3QXln/import_neighborhood
+ rm -f population.zip
+ for x in *
+ mv population.cpg population.cpg
mv: 'population.cpg' and 'population.cpg' are the same file

Therfore I believe this block does not add value and can be safely removed. Once these lines where delete, the analysis completed successfully.

After that the analysis proceeds like a charm.

KlaasH commented 2 years ago

Thanks for writing this up. The variety of sources that the analysis can pull its input files from (URLs, S3 cache, local files) makes it hard to keep the processing steps unified/coherent and to validate all the cases and combinations. I think this was a case where we focused on one scenario (loading the population file from a URL) and neglected the one you're trying to use.

The rm command does do something in the case where the file is provided via PFB_POP_URL, and I think the mv commands are necessary in the situation where the zip filename itself doesn't match the names of the files that get extracted from it. But yeah, we can't have them crashing in the simple /data/population.zip case. I made PR #905 as an alternative fix that I think handles both situations. Let me know if you think it will serve the purpose.

rgreinho commented 2 years ago

@KlaasH Thanks for your answer.

I rebuilt my image with your branch (#905) and still got an issue. I commented on the PR with the output and a potential solution.

rgreinho commented 2 years ago

Resolved by #905.