Closed danrademacher closed 3 years ago
We would need to go back and populate businessdistrict
in existing entries, since the ETL script does a "where intersects and is null" to do them brute force. So, if we don't do them today in a zillion small batches, it will crash tomorrow when the script tries to run all of them in one go.
[x] create new polygon dataset
nyc_businessdistrict
businessdistrict
column in crashes_all_prod DB table[x] populate existing rows
update_borough()
to update a few crashes, to prototype the code I will write[x] ETL script
Doing the bulk update of existing records was not difficult, and not overly time-consuming. I ran this 25 times, where X was 0 through 24, thereby doing 1/25th of the crashes at a time and not hitting the time limit. This wasn't even close to the 30-second limit, though, and could probably have been done in % 10
instead.
UPDATE crashes_all_prod
SET businessdistrict = a.bidistrict
FROM nyc_businessdistrict a
WHERE crashes_all_prod.the_geom IS NOT NULL AND ST_Within(crashes_all_prod.the_geom, a.the_geom)
AND crashes_all_prod.businessdistrict IS NULL
AND crashes_all_prod.cartodb_id % 25 = X
Result: 178,665 crashes had a BID assigned which is about 10% of crashes. Not surprising, since most of NYC is not in a B.I.D. This left 1,683,199 crashes not assigned to a BID.
Surprising, though: I just ran a test of all of it in one single query, and the run time was under 3 seconds.
UPDATE crashes_all_prod
SET businessdistrict = a.bidistrict
FROM nyc_businessdistrict a
WHERE crashes_all_prod.the_geom IS NOT NULL AND ST_Within(crashes_all_prod.the_geom, a.the_geom)
AND crashes_all_prod.businessdistrict IS NULL
This is well under the 30-second limit, so should work just fine as a new ETL addition without any special considerations.
The code modification was super simple, and is now in place and deployed to Heroku. it should run tomorrow.
[x] check heroku console for errors https://dashboard.heroku.com/apps/nyc-crash-mapper-etl/logs
[x] run SQL and see what BIDs were tagged (if any) in last night's haul
I expect that 90% of them will be null, but still would expect at least a few to have this new field assigned and not to see an error in the heroku console.
SELECT cartodb_id, businessdistrict
FROM crashes_all_prod
WHERE cartodb_id > 3151452
All good here. Last night's run did work, no errors, and with a few of the new crashes being assigned but most not, as expected. 👍
As a first step on this new Walk Mapper project, we need to add Business Improvement Districts to CARTO and then assign Business Improvement Districts to our data, similar to https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/blob/f9785e323dd595808f3f89a0be217392c5af536f/main.py#L408-L422
The data is here: https://drive.google.com/open?id=1qoUAdSabPUMcoSmkeV9NWIYPORXc3kvA
This will be part of the new Walk Mapper data system, and I think it is prudent to add it to Crashmapper, though I'm not sure yet how or when it will get added to the front end there, so it might be best use of effort to add the new tagging going forward but not yet tackle trying to apply it backwards to ~2 million crashes, which seems like a rabbit hole.
Bills to
CHEKPEDS:Walk Mapper Phase 1