Closed danrademacher closed 6 years ago
And another piece of complexity I left out of this note -- she would rather OVERCOUNT crashes by assigning them to multiple nearby intersections vs undercount for any one intersection. But that seems to require a multi-key intersection
field in the crashes table that is different from all teh single key geographies currently in use (where a crash is in exactly 1 borough, council, neighborhood, etc)
So looks like SQL by geom is here: https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view/blob/master/src/common/sqlQueries.js#L23
and the options in Filter by Boundary are here: https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view/blob/dc9815fddf08a56e24814de13f628f33ef5554e4/src/containers/FilterByBoundary.jsx#L35
Feels like the hardest challenges here will be:
Scripts and documentation, as well as the dataset, are in the documentation/
folder.
Using the NYC LION data, I mapped their existing nodes point data to the street lines, in order to assign the nodes a pair of streets where possible. After de-duplication efforts, a few rounds of QC, etc. this comes up to 40,202 intersection points.
This has been loaded into CARTO as a table and also as a visualized map:
Table has a crashcount field, which was calculated from the crash data to date.
Results are quite good. Only outlier question at present, is that a physical intersection may have multiple intersection-circle records, where more than 2 streets meet, e.g. 7 AVENUE SOUTH & BARROW STREET, 7 AVENUE SOUTH & BLEECKER STREET, and BARROW STREET & BLEECKER STREET are three distinct combinations, though in physical space they overlap by 99%
I did come up with a technique for updating the crashcount field. The runtime of the update is about 45 seconds, so is broken up into chunks.
The code should be ready to run, when I get clearer instructions for how to merge it and deploy to Heroku. This update to the ETL process could happen independently of the other Crash Mapper UI requested, even if the other UI work proves infeasible, as backing data for some other application or simply reporting "the most dangerous intersections" etc.
37,057 intersections have crashes of the 40,202 records, This is 92%.
When taking the "top X intersections, by crashcount" as a potential mechanism to isolate only intersections which are of highest concern, there is a "very long tail" effect. The 4000 most-crashy intersections (top 10%) are MUCH more crash-heavy than the bottom 36000 (90%).
top 1000: 128 is the lowest crashcount
top 2000: 89 is the lowest crashcount
top 3000: 70 is the lowest crashcount
top 4000: 57
top 5000: 49
top 6000: 43
top 7000: 38
top 8000: 34
top 9000: 31
top 10000: 28
crashcount > 0
is 26Per discussion today:
On the UI side, the blocking problems are:
I have adapted my copy of the ETL script so that update_intersections_crashcount()
updates the crashcount field based on crashes of the last 36 months, rather than all crashes of all time
Further info from client:
The filter for the intersections should be those where there has been a crash with injuries or fatalities. This should reduce the number to less than 12, 000
Might be worth trying the Top X again by Injury+Fatality and see where that lands for top 250,500,1000
There are 22,934 intersections fitting the criteria:
number_of_persons_injured > 0 OR number_of_persons_killed > 0
That's not 12,000 but does narrow down to half of the universe of intersections.
The top X intersections by injury-or-fatality crashcount, and the number of crashes at that "break"
highest single intersection: 74 crashes top 10 intersections: 56 crashes top 50 intersections: 33 crashes top 100 intersections: 28 crashes top 200 intersections: 23 crashes top 500 intersections: 17 crashes top 1000 intersections: 12 crashes
Method was: SELECT MIN(crashcount) FROM (SELECT crashcount FROM nyc_intersections WHERE crashcount IS NOT NULL ORDER BY crashcount DESC LIMIT 100) ccs
Number of these intersections with only 1 or 2 crashes (last 36, inj/fat): 12,129
So again with a very long tail: It tapers from 74 and 56 at the top 10, then slowly into the 30s, 20s, and teens over the next 1000, then another 11,000 as we get down to the 1-crash and 2-crash intersections. This could give a narrative as to the 10 most dangerous intersections, but does not make it easy to pick a clear break where after the top X intersections we can meaningfully ignore further results.
Number of intersections with at least X crashes: at least 50 crashes: 12 intersections at least 30 crashes: 88 at least 20 crashes: 321 at least 15 crashes: 701 at least 10 crashes: 1708
There are 385 intersections fitting the criteria:
OR number_of_persons_killed > 0
Now we're getting into area which may be feasible, to render via a GeoJSON export and for a picklist that someone could use. But as pointed out, the client is interested in injuries since a) medical care means fewer fatalities, while b) injuries can be life-changing and therefore of significant interest.
Final approach:
Add intersections as a new kind of "BOUNDARY AREA" to Map and Chart views.
Limit INTERSECTIONS to anywhere a crash has ever occured.
Buffer the INTERSECTIONS to 95-foot radius, creating a polygon boundary.
Use either spatial query or join table to allow many-to-many relationship between intersections and crashes.
The available intersections will be the top 500 based on injuries+cashes over last 36 or 24 months (time period TBD).
Name the intersections as
https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/pull/10
This PR adds to the ETL script, a new function to update the nyc_intersections table's crashcount field. This is the tally of crashes with an injury or fatality within the last 36 months.
The top 500 records here, will form the basis of the intersections as described above.
SELECT * FROM nyc_intersections
WHERE crashcount IS NOT NULL
ORDER BY crashcount DESC
Working A-OK:
Intersection is now presented as an area type, and are fetched accordingly
Tooltips have been modified with a custom condition for intersections, to tease out the ID# from the human-friendly name.
Clicking an intersection-circle on the map, filters to crashes within that area.
Interesting / Needs Work:
How many intersections to fetch? At present, the top 1000 intersections are fetched, as we are aiming towards the maximum that can occur without crashing the map.
Intersection polygons with 95-foot radius, are not visible at citywide zoom levels. The map zooms out and "nothing happens", as a user wouldn't know to zoom in to see discontiguous, smaller areas only visible when closer-in.
The circles come from CARTO in GeoJSON format, and "just look ugly" especially at some intermediate zoom levels. I've not seen that behavior before, and the circles look fine when CARTO renders them in its own map utility.
Intersection names don't fit well into the sidebar, which was designed for single-word names, e.g. "Bronx" or "117" Names such as BRUCKNER BOULDVARD & EAST 149 STREET" don't fit so well.
Not related but possible usability benefit:
This comment belongs more on the map repo but I'm adding it here as that seems to be where the intersections discussion is taking place.
It seems like Intersections are a much different geography type than contiguous polygon boundaries and that they warrant their own special handling.
Instead throwing a 1,000 tiny polygons on the map and cramming intersection names into the side of the UI next to the filter by area buttons, why not just make a separate UI for them that lists them so that they're readable and easier to interact with / find?
There could be a search input with a dropdown below it that allows for a user to type in the name of an intersection and/or select one from a drop down. When the app loads, the names of all the intersections could be fetched to create the dropdown list which would be hidden until a user starts typing in the input. A text search in the input could filter the names displayed in the dropdown as the user types. When an intersection name is selected from the dropdown or by hitting "enter" on the keyboard when the input is focused, a network request would be made for the crash data and the map would zoom to the area of interest.
The search input and dropdown could live in the upper left of the page, similar to a search input for geocoding an address.
A couple downsides with this approach I see are
It doesn't live in the filter by area panel and may be inconsistent with the map's filter UIs
It could be confused with an address search rather than intersection search
It might be difficult to know which intersection I'm selecting as intersection names are not unique
For #1
I think this is justified because intersections aren't really boundaries, so it's odd for them to live under filter by boundary. For #2
I think either having an explicit "search by intersection" or even a toggle that switches between searching by "intersection" and "address" if address geocoding is a desirable feature to have at some point in the future. For #3
the borough name and perhaps neighborhood name or zipcode could be listed next to the intersection name to help clarify.
Currently selecting intersections on the map is problematic. When I enable intersections by clicking the button in the UI and am at the default zoom level of the map (zoomed out pretty far), it's not apparent that anything happens / I can't see the intersections on the map. Only when I start zooming in do I actually see the intersections appear, so the "select by clicking" only method has some usability issues.
The best solution here would probably be to replace the boundary buttons on the map with the more flexible module we use in the chart UI. That would address UX issues you highlight, and which we’d already considered, but without introducing an entirely new element.
But client priority at present is seeing Intersections in chart views, so we needed to take a minimalist approach to adding them to the map. I expect we’ll get another bite or two at this apple.
Re-purposing the list UI from the chart view would be helpful, but how would that solve the "select by clicking polygon on the map" for intersections? For other geographies having the list and being able to click on the map would be a usability improvement for filtering by a geographic area, but with intersections the polygons would still not be visible on the map at the default zoom level, which would still leave the inconsistent UX with how other geographies can be selected. Furthermore many intersection polygons overlap with one another which makes selecting an intersection area confusing.
Listing the intersections without other information such as borough / neighborhood / zipcode would also make their selection via a list cumbersome and confusing as intersection names are not unique, and adding that extra info to only intersections would still mean cramming it into a small side bar.
Having a separate UI and UX for searching by intersection would be more "natural" to how one searches for an address in Google Maps and allow for more screen real estate to display other geographic info to an intersection such as the borough name.
f3855dc brings the basics of charting intersections.
name|id
as this new dataset doesn't have unique human-readable names (intersection names are known not to be unique) and present labels and IDs into the list as appropriate; see labelFormatters.js entityIdDisplay()
and entityNameDisplay()
Still to do:
entityNameDisplay()
for the label hereThen Map-Chart integration:
33e4324 brings to the Compare chart, a fix to the SVG height calculation, in the event that an area has 0 injuries/kills. The filtering and tally would return undefined
which would then form a NaN
instead of a number.
This happens only with very small areas such as Intersections, and so was easily overlooked: no precinct, neighborhood, etc. would have ever had 0 injures/kills in any single one-year period.
Note: 520e404 reduces the max intersection count from 1000 to 500. Due to the "cloudy" nature of CARTO, the LIMIT 1000 queries were timing out over the last hour, though never did earlier in the day. I will make a corresponding change on the Map as well.
Confirmed that the Chart system, when &geo=intersection
and a &primary=
is given in URL params, it works as expected:
&primary=
is updated when areas are selected, and page reload reloads the area as intended&primary=
is updated when areas are selected, and page reload reloads the area as intendedThis should round out everything that is expected of Intersections at this point in time. display on the map and can be charted, are labeled appropriately, etc. Let's get some preliminary approval on this from the client, then spawn some specific issues for specific improvements. You should be able to check out the gda-intersections
branch to test this locally.
this is SCOPING not DOING, though I have had a hard time doing the former without getting into the latter, as will become apparent below.
I've been talking with Christine and testing out various datasources for inclusion of "Intersections" in the application, on both map and chart sides, as a new "BOUNDARY AREA" like Neighborhood Tabulation Areas.
Roughly what the interface would look like: https://drive.google.com/open?id=1npLJt1F9XngHAoFmkKugQ_xWhnQp1u9-
The challenge is twofold:
Define Intersections
I have identified a datasource, represented, very slowly, in this CARTO map: https://chekpeds.carto.com/tables/lionendnodestyp0awdeduped/map
These circles are the results of the following:
Take the lines (not node) data from this NYC dataset
Filter to lines where
FeatureTyp in ('0','A','W')
in QGIS to avoid loading a ridiculously large SHP into CARTP.Create nodes from those lines using this command:
When we do for real, we'd likely preprocess out even more by rolling the filter in Step 5 into the query. This first pass has been pretty ad hoc to try get a handle on the problem.
Because this produced duplicate geometries wherever line starts overlapped, I used QGIS's
Delete duplicate geometries
function to get rid of duplicates.This has some drawbacks. First, it takes a really long time. But more importantly, it didn't allow me to record cross-streets in the node. So we have a bunch of
street=100th St
nodes and no way to differentiate other than ID fields. Ideally, we'd run a Postgres query that deduped AND added across_street
field. This will bring up an issue where multiple streets converge. Not sure what to do there.Client specified a 95-foot or 29-meter radius around intersections. Once we're confident that we have our final set of intersections, I assume we'd bake this into a polygon dataset.
It will still be huge and slow. The table above has about 85,000 points in it.
Also, I noted that some areas have clear noise caused by topology issues (cut up lines in teh source data), and client says this is acceptable. Feels like a risk.
Which brings us to:
Manage volume of data
Client suggests filtering the total world of intersections only to those that have ever had a crash in them. This would involve storing two tables on CARTO: One of all intersections and one of intersections with a crash. the latter would be used in the app. We would need to modify the ETL script to recreate the app dataset on each import, and then we'd need to see if the result is any more tractable.
Client also suggests limiting front-end app to show only top 250 intersections in any view, but if that's always 250 ORDER BY most crashes within user-selected date range, then is that really a savings?