Closed KlaasH closed 5 years ago
Per discussion yesterday, upgrading python made the OSM extract caching setup start failing.
When there's no file present in the cache bucket, the analysis job writes a lockfile to the cache bucket, confirms it got the lock, then downloads the OSM extract from Geofabrik, uploads it to the bucket, and finally removes the lockfile.
The step where it confirms it got the lock was failing, because it turns out the read()
method on the object returned by boto3's S3 client returns a bytestring. So b'unique identifier'
was being compared to 'unique identifier'
and not matching, and the script concluded that some other job got the lock and it should wait for that one to finish.
Here's a snippet that shows the behavior (fill in your bucket name), which you can run anywhere you have a Python 3 environment with boto3
installed (e.g. ./scripts/console django
):
import boto3
bucket = 'YOUR_STORAGE_BUCKET'
s3_client = boto3.client('s3')
key = 'test_file'
content = "This is a file with words in it."
s3_client.put_object(Bucket=bucket, Key=key, Body=content)
downloaded = s3_client.get_object(Bucket=bucket, Key=key)['Body'].read()
if downloaded != content:
print("downloaded text doesn't match uploaded")
if downloaded.decode('utf-8') == content:
print("but it does if you decode it")
I just pushed a fix (575f975), decoding the downloaded string.
FWIW everything completed successfully and I was able to view the analysis results on the map.
Overview
Upgrades the
django
anddjango-q
services to Python 3. Specifically 3.6, since that's the default on the new vagrant base image I picked and the highest available version for the base container we're using.I used
futurize
to make most of the changes, but with a few tweaks to make the code cleaner at the cost of backward compatibility. Mainly I removed all the places where classes inherited from(object)
, rather than adding new ones likefuturize
would do by default. Since this isn't a library, backward compatibility isn't a priority. There are some bits that aren't fully migrated, though, likefrom past.builtins import basestring
inmodels.py
, because they work fine, they don't really make the code any messier than it would be with a py3-only implementation, and it's easier to leave them than to refactor them.There was one incompatibility (so far) that
futurize
didn't catch--testingisinstance(filename, file)
--that I corrected by hand. There was also an issue, that made most of the tests crash, whereby we were failing to mark a file as binary when opening it, which resulted in encoding errors way down the line. It was no less a bug under Python 2, but apparently something in the py2 code was able to handle the inconsistency where py3 can't.Demo
Here's some output from
scripts/setup
that shows some of the new bits:There's no way to actually tell, but this was created with a Python 3 analysis and served with a Python 3 API:
Notes
futurize
command I ran was:test/
branch to check that this would build and deploy properly, which it did.Testing Instructions
scripts/setup
to bring up and provision the VM then build the containerssrc/django/pfb_analysis/tests/data/batch_create_shapefile/PFB_BigJumpCities_1.zip
. It should create a bunch of neighborhoods. It's fine to use a fairly large file because the actual analysis jobs won't be triggered, they'll just be printed to the log as commands you can run by hand.Checklist
Resolves #751