Include Intersections as a new Boundary Type

GreenInfo-Network / nyc-crash-mapper-chart-view

Chart view for NYC Crash Mapper that allows for viewing Trends, Comparing, and Ranking of various NYC geographies

http://vis.crashmapper.org

MIT License

2 stars 1 forks source link

Include Intersections as a new Boundary Type #73

Closed danrademacher closed 6 years ago

danrademacher commented 6 years ago

this is SCOPING not DOING, though I have had a hard time doing the former without getting into the latter, as will become apparent below.

I've been talking with Christine and testing out various datasources for inclusion of "Intersections" in the application, on both map and chart sides, as a new "BOUNDARY AREA" like Neighborhood Tabulation Areas.

Roughly what the interface would look like: https://drive.google.com/open?id=1npLJt1F9XngHAoFmkKugQ_xWhnQp1u9-

The challenge is twofold:

Define intersections
Manage the volume of data

Define Intersections

I have identified a datasource, represented, very slowly, in this CARTO map: https://chekpeds.carto.com/tables/lionendnodestyp0awdeduped/map

These circles are the results of the following:

Take the lines (not node) data from this NYC dataset
Filter to lines where FeatureTyp in ('0','A','W') in QGIS to avoid loading a ridiculously large SHP into CARTP.

Create nodes from those lines using this command:

$ ogr2ogr -f "ESRI Shapefile" -dialect sqlite LionEndNodesTyp0AW.shp -sql "select ST_StartPoint(geometry), OBJECTID,Street,SAFStreetN,FeatureTyp,SegmentTyp,IncExFlag,RB_Layer,NonPed,TrafDir,TrafSrc,SpecAddr,FaceCode,SeqNum,StreetCode,SAFStreetC,LGC1,LGC2,LGC3,LGC4,LGC5,LGC6,LGC7,LGC8,LGC9,BOE_LGC,SegmentID,SegCount,LocStatus,LZip,RZip,LBoro,RBoro,L_CD,R_CD,LATOMICPOL,RATOMICPOL,LCT2010,LCT2010Suf,RCT2010,RCT2010Suf,LCB2010,LCB2010Suf,RCB2010,RCB2010Suf,LCT2000,LCT2000Suf,RCT2000,RCT2000Suf,LCB2000,LCB2000Suf,RCB2000,RCB2000Suf,LCT1990,LCT1990Suf,RCT1990,RCT1990Suf,LAssmDist,LElectDist,RAssmDist,RElectDist,SplitElect,LSchlDist,RSchlDist,SplitSchl,LSubSect,RSubSect,SanDistInd,MapFrom,MapTo,BoroBndry,MH_RI_Flag,XFrom,YFrom,XTo,YTo,ArcCenterX,ArcCenterY,CurveFlag,Radius,NodeIDFrom,NodeIDTo,NodeLevelF,NodeLevelT,ConParity,Twisted,RW_TYPE,PhysicalID,GenericID,NYPDID,FDNYID,LBlockFace,RBlockFace,LegacyID,Status,StreetWidt,StreetWi_1,StreetWi_2,BikeLane,BIKE_TRAFD,ACTIVE_FLA,POSTED_SPE,Snow_Prior,Number_Tra,Number_Par,Number_Tot,Carto_Disp,FCC,ROW_Type,LLo_Hyphen,LHi_Hyphen,RLo_Hyphen,RHi_Hyphen,FromLeft,ToLeft,FromRight,ToRight,Join_ID,SHAPE_Leng from LionLinesFeat0AW"  LionLinesFeat0AW.shp

When we do for real, we'd likely preprocess out even more by rolling the filter in Step 5 into the query. This first pass has been pretty ad hoc to try get a handle on the problem.

Because this produced duplicate geometries wherever line starts overlapped, I used QGIS's Delete duplicate geometries function to get rid of duplicates.

This has some drawbacks. First, it takes a really long time. But more importantly, it didn't allow me to record cross-streets in the node. So we have a bunch of street=100th St nodes and no way to differentiate other than ID fields. Ideally, we'd run a Postgres query that deduped AND added a cross_street field. This will bring up an issue where multiple streets converge. Not sure what to do there.

I then loaded the deduped nodes into CARTO, table here and realize that I also need to buffer AND filter out yet more with this SQL:
```
SELECT 
ST_Buffer(nodes.the_geom_webmercator,29) as the_geom_webmercator,
ST_Buffer(nodes.the_geom,29) as the_geom,
nodes.cartodb_id,
street,
streetcode,
streetwidt
FROM 
lionendnodestyp0awdeduped as nodes
where rw_type in ('1')
```
Client specified a 95-foot or 29-meter radius around intersections. Once we're confident that we have our final set of intersections, I assume we'd bake this into a polygon dataset.

It will still be huge and slow. The table above has about 85,000 points in it.

Also, I noted that some areas have clear noise caused by topology issues (cut up lines in teh source data), and client says this is acceptable. Feels like a risk.

Which brings us to:

Manage volume of data

Client suggests filtering the total world of intersections only to those that have ever had a crash in them. This would involve storing two tables on CARTO: One of all intersections and one of intersections with a crash. the latter would be used in the app. We would need to modify the ETL script to recreate the app dataset on each import, and then we'd need to see if the result is any more tractable.
Client also suggests limiting front-end app to show only top 250 intersections in any view, but if that's always 250 ORDER BY most crashes within user-selected date range, then is that really a savings?

danrademacher commented 6 years ago

And another piece of complexity I left out of this note -- she would rather OVERCOUNT crashes by assigning them to multiple nearby intersections vs undercount for any one intersection. But that seems to require a multi-key intersection field in the crashes table that is different from all teh single key geographies currently in use (where a crash is in exactly 1 borough, council, neighborhood, etc)

danrademacher commented 6 years ago

So looks like SQL by geom is here: https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view/blob/master/src/common/sqlQueries.js#L23

and the options in Filter by Boundary are here: https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view/blob/dc9815fddf08a56e24814de13f628f33ef5554e4/src/containers/FilterByBoundary.jsx#L35

danrademacher commented 6 years ago

Feels like the hardest challenges here will be:

Modifying ETL script to work with new Intersection geoms, but this is all Python that doesn't touch the React App.
Managing too-many-intersections problem. Obv much more enmeshed with the React app

gregallensworth commented 6 years ago

Preliminary results of investigation so far

Intersection Points

Scripts and documentation, as well as the dataset, are in the documentation/ folder.
Using the NYC LION data, I mapped their existing nodes point data to the street lines, in order to assign the nodes a pair of streets where possible. After de-duplication efforts, a few rounds of QC, etc. this comes up to 40,202 intersection points.
This has been loaded into CARTO as a table and also as a visualized map:
- https://chekpeds.carto.com/tables/nyc_intersections/table
- https://chekpeds.carto.com/viz/74d82233-4f8e-4540-b43b-83fffe6489a7/map
Table has a crashcount field, which was calculated from the crash data to date.
Results are quite good. Only outlier question at present, is that a physical intersection may have multiple intersection-circle records, where more than 2 streets meet, e.g. 7 AVENUE SOUTH & BARROW STREET, 7 AVENUE SOUTH & BLEECKER STREET, and BARROW STREET & BLEECKER STREET are three distinct combinations, though in physical space they overlap by 99%

ETL

I did come up with a technique for updating the crashcount field. The runtime of the update is about 45 seconds, so is broken up into chunks.
The code should be ready to run, when I get clearer instructions for how to merge it and deploy to Heroku. This update to the ETL process could happen independently of the other Crash Mapper UI requested, even if the other UI work proves infeasible, as backing data for some other application or simply reporting "the most dangerous intersections" etc.

Statistical Analysis

37,057 intersections have crashes of the 40,202 records, This is 92%.
- As such, omission of crash-free intersections from certain views or lists may remove some visual noise, but not on a very large scale.
When taking the "top X intersections, by crashcount" as a potential mechanism to isolate only intersections which are of highest concern, there is a "very long tail" effect. The 4000 most-crashy intersections (top 10%) are MUCH more crash-heavy than the bottom 36000 (90%).

  top 1000: 128 is the lowest crashcount
  top 2000: 89 is the lowest crashcount
  top 3000: 70 is the lowest crashcount
  top 4000: 57
  top 5000: 49
  top 6000: 43
  top 7000: 38
  top 8000: 34
  top 9000: 31
  top 10000: 28

Corroborating the above "long tail", the mean crashcount for all records with crashcount > 0 is 26

UX Possibilities in Application

The ETL development did give some running-speed metrics that finding the "most crashy" intersections for a given date range would only be feasible under infeasibly strict conditions, such as a single year and also a geographic area. Under broader conditions, runtimes reach 45 seconds which is an unacceptably long user experience as well as automatic termination by CARTO.

gregallensworth commented 6 years ago

Per discussion today:

On the UI side, the blocking problems are:
- Map: loading of a 40,000-circle GeoJSON file is beyond capability of browsers
- Charts: a select-which-district list with 40,000 entries will crash a browser
- The only way forward will be to limit the universe of intersections, e.g. the highest-crashcount intersections of the last 36 months, and to reasonable volumes no higher than the current maximum (250, maybe 300)
I have adapted my copy of the ETL script so that update_intersections_crashcount() updates the crashcount field based on crashes of the last 36 months, rather than all crashes of all time
- https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/blob/master/main.py
- Note that this change has not been greenlit for merge, so won't be visible in the repo yet

danrademacher commented 6 years ago

Further info from client:

The filter for the intersections should be those where there has been a crash with injuries or fatalities. This should reduce the number to less than 12, 000

Might be worth trying the Top X again by Injury+Fatality and see where that lands for top 250,500,1000

gregallensworth commented 6 years ago

Intersections with Injuries or Fatalities

There are 22,934 intersections fitting the criteria:

Had a crash within the last 36 months from today,
where the crash has number_of_persons_injured > 0 OR number_of_persons_killed > 0

That's not 12,000 but does narrow down to half of the universe of intersections.

This "about half" will grow over time, as intersections gain new crashes with fatalities or injuries. This 23k could be 24k by next year, 25k the next year, etc.
22,000 (or even 12,000) is still too many to fit into a GeoJSON file for rendering, or to fit manageably into a pick-list.

Injuries & Fatalities, Tail Distribution

The top X intersections by injury-or-fatality crashcount, and the number of crashes at that "break"

highest single intersection: 74 crashes top 10 intersections: 56 crashes top 50 intersections: 33 crashes top 100 intersections: 28 crashes top 200 intersections: 23 crashes top 500 intersections: 17 crashes top 1000 intersections: 12 crashes

Method was: SELECT MIN(crashcount) FROM (SELECT crashcount FROM nyc_intersections WHERE crashcount IS NOT NULL ORDER BY crashcount DESC LIMIT 100) ccs

Number of these intersections with only 1 or 2 crashes (last 36, inj/fat): 12,129

So again with a very long tail: It tapers from 74 and 56 at the top 10, then slowly into the 30s, 20s, and teens over the next 1000, then another 11,000 as we get down to the 1-crash and 2-crash intersections. This could give a narrative as to the 10 most dangerous intersections, but does not make it easy to pick a clear break where after the top X intersections we can meaningfully ignore further results.

Injuries & Fatalities, Intersection Tally by Minimum Crashcount

Number of intersections with at least X crashes: at least 50 crashes: 12 intersections at least 30 crashes: 88 at least 20 crashes: 321 at least 15 crashes: 701 at least 10 crashes: 1708

Intersections with Fatalities

There are 385 intersections fitting the criteria:

Had a crash within the last 36 months from today,
where the crash has OR number_of_persons_killed > 0

Now we're getting into area which may be feasible, to render via a GeoJSON export and for a picklist that someone could use. But as pointed out, the client is interested in injuries since a) medical care means fewer fatalities, while b) injuries can be life-changing and therefore of significant interest.

danrademacher commented 6 years ago

Final approach:

Add intersections as a new kind of "BOUNDARY AREA" to Map and Chart views.

Limit INTERSECTIONS to anywhere a crash has ever occured.
Buffer the INTERSECTIONS to 95-foot radius, creating a polygon boundary.
Use either spatial query or join table to allow many-to-many relationship between intersections and crashes.
The available intersections will be the top 500 based on injuries+cashes over last 36 or 24 months (time period TBD).
Name the intersections as and

gregallensworth commented 6 years ago

https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/pull/10

This PR adds to the ETL script, a new function to update the nyc_intersections table's crashcount field. This is the tally of crashes with an injury or fatality within the last 36 months.

The top 500 records here, will form the basis of the intersections as described above.

SELECT * FROM nyc_intersections
WHERE crashcount IS NOT NULL
ORDER BY crashcount DESC

gregallensworth commented 6 years ago

Map side

Working A-OK:

Intersection is now presented as an area type, and are fetched accordingly
Tooltips have been modified with a custom condition for intersections, to tease out the ID# from the human-friendly name.
- Intersection names are known not to be unique, therefore are not usable as identifiers.
Clicking an intersection-circle on the map, filters to crashes within that area.

Interesting / Needs Work:

How many intersections to fetch? At present, the top 1000 intersections are fetched, as we are aiming towards the maximum that can occur without crashing the map.
- This takes 2 seconds to perform the query (at CARTO) plus 2 seconds to download the content (676 KB over good cable Internet) The lag is noticeable, and on slower connections may not be acceptable.
Intersection polygons with 95-foot radius, are not visible at citywide zoom levels. The map zooms out and "nothing happens", as a user wouldn't know to zoom in to see discontiguous, smaller areas only visible when closer-in.
- To a small degree this could be ameliorated by disabling the zoom-to-citywide behavior when one clears a specific selected area for filtering.
The circles come from CARTO in GeoJSON format, and "just look ugly" especially at some intermediate zoom levels. I've not seen that behavior before, and the circles look fine when CARTO renders them in its own map utility.
Intersection names don't fit well into the sidebar, which was designed for single-word names, e.g. "Bronx" or "117" Names such as BRUCKNER BOULDVARD & EAST 149 STREET" don't fit so well.
- Cropping behavior is at 14 characters to prevent linewrap, but for intersection names that's not even readable...

Not related but possible usability benefit:

When i switch to another area type, or de-select a filtering area, the map is zoomed to citywide level. If I wanted to move to a neighboring precinct/neighborhood/intersection I have now lost my place.
- Could this behavior be disabled? That may be a usability gain, if this is a realistic use case.

clhenrick commented 6 years ago

This comment belongs more on the map repo but I'm adding it here as that seems to be where the intersections discussion is taking place.

It seems like Intersections are a much different geography type than contiguous polygon boundaries and that they warrant their own special handling.

Instead throwing a 1,000 tiny polygons on the map and cramming intersection names into the side of the UI next to the filter by area buttons, why not just make a separate UI for them that lists them so that they're readable and easier to interact with / find?

There could be a search input with a dropdown below it that allows for a user to type in the name of an intersection and/or select one from a drop down. When the app loads, the names of all the intersections could be fetched to create the dropdown list which would be hidden until a user starts typing in the input. A text search in the input could filter the names displayed in the dropdown as the user types. When an intersection name is selected from the dropdown or by hitting "enter" on the keyboard when the input is focused, a network request would be made for the crash data and the map would zoom to the area of interest.

The search input and dropdown could live in the upper left of the page, similar to a search input for geocoding an address.

A couple downsides with this approach I see are

It doesn't live in the filter by area panel and may be inconsistent with the map's filter UIs
It could be confused with an address search rather than intersection search
It might be difficult to know which intersection I'm selecting as intersection names are not unique

For #1 I think this is justified because intersections aren't really boundaries, so it's odd for them to live under filter by boundary. For #2 I think either having an explicit "search by intersection" or even a toggle that switches between searching by "intersection" and "address" if address geocoding is a desirable feature to have at some point in the future. For #3 the borough name and perhaps neighborhood name or zipcode could be listed next to the intersection name to help clarify.

Currently selecting intersections on the map is problematic. When I enable intersections by clicking the button in the UI and am at the default zoom level of the map (zoomed out pretty far), it's not apparent that anything happens / I can't see the intersections on the map. Only when I start zooming in do I actually see the intersections appear, so the "select by clicking" only method has some usability issues.

danrademacher commented 6 years ago

The best solution here would probably be to replace the boundary buttons on the map with the more flexible module we use in the chart UI. That would address UX issues you highlight, and which we’d already considered, but without introducing an entirely new element.

But client priority at present is seeing Intersections in chart views, so we needed to take a minimalist approach to adding them to the map. I expect we’ll get another bite or two at this apple.

clhenrick commented 6 years ago

Re-purposing the list UI from the chart view would be helpful, but how would that solve the "select by clicking polygon on the map" for intersections? For other geographies having the list and being able to click on the map would be a usability improvement for filtering by a geographic area, but with intersections the polygons would still not be visible on the map at the default zoom level, which would still leave the inconsistent UX with how other geographies can be selected. Furthermore many intersection polygons overlap with one another which makes selecting an intersection area confusing.

Listing the intersections without other information such as borough / neighborhood / zipcode would also make their selection via a list cumbersome and confusing as intersection names are not unique, and adding that extra info to only intersections would still mean cramming it into a small side bar.

Having a separate UI and UX for searching by intersection would be more "natural" to how one searches for an address in Google Maps and allow for more screen real estate to display other geographic info to an intersection such as the borough name.

gregallensworth commented 6 years ago

f3855dc brings the basics of charting intersections.

Intersection offered as a type of area
Top 1000 intersections by crashcount (ETL nightly, last 36 months fatality and/or injury) are fetched into the listing
New code to split the name|id as this new dataset doesn't have unique human-readable names (intersection names are known not to be unique) and present labels and IDs into the list as appropriate; see labelFormatters.js entityIdDisplay() and entityNameDisplay()

Still to do:

[x] Tooltips still use the ID as given, e.g. Intersection 1 AVE & B ST|12345 Try to use entityNameDisplay() for the label here
[x] Text search (area listing section, "Search a ..." box to filter list) not working for Intersection, but confirmed working OK for other types
[x] Test Compare chart view; so far focus has been Trend

Then Map-Chart integration:

[x] Coming from Map to Chart when showing an intersection, should have Intersection selected as a type
- confirm this behavior for other area types, when following from Chart view?
- should it also pre-select this specific area? confirm behavior for other area types.
[x] Following from Chart to Map when Intersection is the selected area type, should have this specific Intersection selected as the focus area
- confirm this behavior for other area types, when following from Chart view?

gregallensworth commented 6 years ago

33e4324 brings to the Compare chart, a fix to the SVG height calculation, in the event that an area has 0 injuries/kills. The filtering and tally would return undefined which would then form a NaN instead of a number.

This happens only with very small areas such as Intersections, and so was easily overlooked: no precinct, neighborhood, etc. would have ever had 0 injures/kills in any single one-year period.

gregallensworth commented 6 years ago

Note: 520e404 reduces the max intersection count from 1000 to 500. Due to the "cloudy" nature of CARTO, the LIMIT 1000 queries were timing out over the last hour, though never did earlier in the day. I will make a corresponding change on the Map as well.

gregallensworth commented 6 years ago

Confirmed that the Chart system, when &geo=intersection and a &primary= is given in URL params, it works as expected:

link to Compare from map, OK - area dot chart is displayed
link to Trend from map, OK - area linechart is displayed
within Trend Chart, &primary= is updated when areas are selected, and page reload reloads the area as intended
within Compare Chart, &primary= is updated when areas are selected, and page reload reloads the area as intended

This should round out everything that is expected of Intersections at this point in time. display on the map and can be charted, are labeled appropriately, etc. Let's get some preliminary approval on this from the client, then spawn some specific issues for specific improvements. You should be able to check out the gda-intersections branch to test this locally.