Closed danrademacher closed 1 year ago
Here's the current data on carto, https://chekpeds.carto.com/tables/nyc_assembly/map, clearly clipped to shoreline:
Looks like the data on Carto has a field with "Assembly District " concatenated onto the integer, so we'd need to do that when we make the new data.
Hmm, the current City Council data on Carto is a bit weird: https://chekpeds.carto.com/tables/nyc_city_council
Pretty weird ID here:
New data:
Actually, I think those other fields don't matter so long as identifier
exists as a bare integer:
https://github.com/GreenInfo-Network/nyc-crash-mapper/blob/26b97a276d2f207469f428739fac80416ea6bdd6/src/constants/sql_queries.js#L27-L34
https://github.com/GreenInfo-Network/nyc-crash-mapper/blob/26b97a276d2f207469f428739fac80416ea6bdd6/src/constants/sql_queries.js#L67-L75
then the labels happen here: https://github.com/GreenInfo-Network/nyc-crash-mapper/blob/26b97a276d2f207469f428739fac80416ea6bdd6/src/constants/app_config.js#L55-L62
So we should be ready for this hardest checkbox:
Since that could be a long running process and we don't want to feel under pressure to get it done while we're changing live data, here's my proposal:
_2023
appended to the namesassembly_2023
and city_council_2023
fields_2023
geom tables and assembly_2023
and city_council_2023
fields_2018
suffixesthat will leave things in a clear state, but also make it easy to roll back if there's some problem with the assignments.
At some later date when we're satisfied (or even next year if we want to leave it for a while), we could delete the old _2018 tables and fields.
DRAFT work in progress as I suss things out
New tables: https://chekpeds.carto.com/tables/nyc_assembly_2023/public https://chekpeds.carto.com/tables/nyc_city_council_2023/public
I recall too that the field name in crashes is hardcoded to the name of the area, e.g. the city_council
visualization/selection implicitly uses the city_council
field from crashes, so using the field name city_council_2023
means changing that name globally throughout. So there will likely be steps I don't know yet...
assemdist
to identifier
id
field with string name CONCAT('Assembly District ', identifier)
Not clear that this field is used, but makes for consistent and readable dataid
field with string name CONCAT('City Council District ', identifier)
Not clear that this field is used, but makes for consistent and readable datageos
cartoTables
isActive
showIdentifier
filterByAreaSQL
filterAreaBtnTableMap
, key but not valuecartoTables
back and forth, compare to map views abovefilterByAreaSQL()
and does not use crashes_all_prod.assembly
and crashes_all_prod.city_council
assembly
and city_council
et al, if they're aggregated and download using a runtime JOIN? maybe no reason (historical) but maybe functionality I don't know to test. I see 7 instances of ${nyc_crashes}
all in src/constants/sql_queries.js See if any are relevant here.
configureMapSQL()
filterByIdentifierWhereClause()
~ only used for borough, specific crash ID, or intersectionconfigureStatsSQL()
filterByIdentifierWhereClause()
~configureFactorsSQL()
filterByIdentifierWhereClause()
~configureDownloadDataSQL()
filterByIdentifierWhereClause()
~crashes_all_prod
create new fields
assembly_2023
city_council_2023
crashes_all_prod
tableassembly
to assembly_2015
city_council
to city_council_2015
assembly_2023
to assembly
city_council_2023
to city_council
nyc_assembly
to nyc_assembly_2015
nyc_city_council
to nyc_city_council_2015
nyc_assembly_2023
to nyc_assembly
nyc_city_council_2023
to nyc_city_council
Then give everything a test, and see what we think. We know that some counts already do not match due to NYC retroactively changing past data, so not sure how we'll QC this. Maybe just seeing that the same crashes still have assembly is null
(assumes districts have changed internally, but not expanded; though I do see a few new islands in the visible map) and that a few have changed assembly != assembly_2023
?
Invalid geometries in the city council districts were causing tallies to match. If we ever do this again, be sure to validate/fix the geometries.
Citywide view work, then I could bring up City Council District and see them. But upon clicking some of the districts no crash points would display. Upon clearing the specific area selection, the districts would appear back on the map BUT would show visible gaps where no crashes were present, but previously were.
-- 17 were bad
SELECT COUNT(*) FROM nyc_city_council_2023 WHERE NOT ST_ISVALID(the_geom)
-- fixed
UPDATE nyc_city_council_2023 SET the_geom = ST_MAKEVALID(the_geom) WHERE NOT ST_ISVALID(the_geom)
-- fixed! and working
SELECT COUNT(*) FROM nyc_city_council_2023 WHERE NOT ST_ISVALID(the_geom)
The nyc_assembly_2023
table did not have invalidities, and seemed to work fine all along.
Diagnosis for this bug having this behavior only AFTER selecting a specific area: Changing from Citywide to a district simply adds the boundaries, but will not change the crash filter to include a spatial join until a specific area is selected. De-selecting that specific area to show boundaries again, keeps the spatial join which was not working.
Code changes ultimately were confined to a single file
Ultimately, the only code changes required for the frontend to use the new districts, were the table definitions: src/constants/app_config.js cartoTables
Crashes do have fields for assembly
city_council
borough
et al, but in fact filterByIdentifierWhereClause()
is only used for intersections and boroughs. All other cases use a runtime spatial join, and there's a note that boroughs are an exception due to timing out.
So the frontend changes are super simple: a two-line patch. Arguably, the two new fields on crashes_all_prod
for assembly_2023
and city_council_2023 fields
is optional, since nothing is using them... except downloads for offline analysis.
I have started running the updates
Turns out the initialpolygonlinkage script really was made for exactly this, and I left good documentation. Doing it in blocks of 2000 crashes at a time, this is 102000 queries
SELECT COUNT(*) FROM crashes_all_prod WHERE city_council_2023 IS NULL AND city_council IS NOT NULL
shows that 454 crashes no longer fall into a city council district, but did when they were loadedSELECT COUNT(*) FROM crashes_all_prod WHERE assembly_2023 IS NULL AND assembly IS NOT NULL
shows that 449 crashes do not fall into an assembly district but did when they were loaded-- overall counts
-- about 205K crashes never fit into any Assembly nor City Council district
-- and that has not changed much
SELECT COUNT(*) FROM crashes_all_prod WHERE city_council_2023 IS NULL
207,512
SELECT COUNT(*) FROM crashes_all_prod WHERE city_council IS NULL
211,578
SELECT COUNT(*) FROM crashes_all_prod WHERE assembly_2023 IS NULL
205,954
SELECT COUNT(*) FROM crashes_all_prod WHERE assembly IS NULL
205,540
Ok, based on the above, I think we're ready to deploy these changes, though I just realized that we might have to make changes in https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view as well for the new tables. I am honestly not sure if it duplicates the spatial queries from the map application is https://github.com/GreenInfo-Network/nyc-crash-mapper, which would mean code changes in both places, or not
Digging into the 450ish crashes that don't fit. It seems to be edge effects, as expected. There are slight differences in the polygon datasets, as you can see below, which really do cause the crash-points to be excluded in 2023 when they were included in 2015. Judging by the Measure Line tool in QGIS, the difference between the datasets is typically under 1 meter, but this is enough when the crash was exactly on an edge.
-- exported these mismatches as GeoJSON
SELECT cartodb_id, ST_X(the_geom) AS lng, ST_Y(the_geom) AS lat FROM crashes_all_prod WHERE city_council_2023 IS NULL AND city_council IS NOT NULL
SELECT cartodb_id, ST_X(the_geom) AS lng, ST_Y(the_geom) AS lat FROM crashes_all_prod WHERE assembly_2023 IS NULL AND assembly IS NOT NULL
I could do a second pass to update any crashes where city_council_2023 IS NULL and city_council IS NOT NULL
but this time with a buffer of 3 meters (10 feet) on the crashes. This should mop up the majority of these 450 outliers.
Several hours later, running scripts to buffer polygons by 0.00002 degrees (about 2 meters) and having CARTO repeatedly time out, or falsely report 0 hits, I gave up on the old methods. I sat down with QGIS and made up lists of cartodb_id and City Council / Assembly identifier, then made up a list of SQL statements to run.
Eventually, we now have only 1 record where assembly_2023
and city_council_2023
are null but assembly
and city_council
are null. And that's because CARTO will time out updating this one specific record.
-- this one row will not let me update it; CARTO SQL API and web UI time out
UPDATE crashes_all_prod SET city_council_2023 = 32 WHERE cartodb_id=1209236
UPDATE crashes_all_prod SET assembly_2023 = 23 WHERE cartodb_id=1209236
Update: it eventually let me update 1209236 a half-hour later
So, ~not counting this 1 record~, the new assignments are done and the fields & tables could be swapped into place. See the "then some day, swap them" checklist item above.
All set! I have renamed the columns and the fields, and the mapper and chart view are using the new districts.
With #39 resolved this is done
Also for the record, I suggested we do Senate and she said she doesn't care about those. Also was reminded that 2022 NYS redistricting was a mess so possibly senate reverted to old lines.
┆Issue is synchronized with this Asana task