GreenInfo-Network / nyc-crash-mapper-chart-view

Chart view for NYC Crash Mapper that allows for viewing Trends, Comparing, and Ranking of various NYC geographies
http://vis.crashmapper.org
MIT License
2 stars 1 forks source link

Include Intersections as a new Boundary Type #73

Closed danrademacher closed 6 years ago

danrademacher commented 6 years ago

this is SCOPING not DOING, though I have had a hard time doing the former without getting into the latter, as will become apparent below.

I've been talking with Christine and testing out various datasources for inclusion of "Intersections" in the application, on both map and chart sides, as a new "BOUNDARY AREA" like Neighborhood Tabulation Areas.

Roughly what the interface would look like: https://drive.google.com/open?id=1npLJt1F9XngHAoFmkKugQ_xWhnQp1u9-

The challenge is twofold:

  1. Define intersections
  2. Manage the volume of data

Define Intersections

I have identified a datasource, represented, very slowly, in this CARTO map: https://chekpeds.carto.com/tables/lionendnodestyp0awdeduped/map

These circles are the results of the following:

  1. Take the lines (not node) data from this NYC dataset

  2. Filter to lines where FeatureTyp in ('0','A','W') in QGIS to avoid loading a ridiculously large SHP into CARTP.

  3. Create nodes from those lines using this command:

    $ ogr2ogr -f "ESRI Shapefile" -dialect sqlite LionEndNodesTyp0AW.shp -sql "select ST_StartPoint(geometry), OBJECTID,Street,SAFStreetN,FeatureTyp,SegmentTyp,IncExFlag,RB_Layer,NonPed,TrafDir,TrafSrc,SpecAddr,FaceCode,SeqNum,StreetCode,SAFStreetC,LGC1,LGC2,LGC3,LGC4,LGC5,LGC6,LGC7,LGC8,LGC9,BOE_LGC,SegmentID,SegCount,LocStatus,LZip,RZip,LBoro,RBoro,L_CD,R_CD,LATOMICPOL,RATOMICPOL,LCT2010,LCT2010Suf,RCT2010,RCT2010Suf,LCB2010,LCB2010Suf,RCB2010,RCB2010Suf,LCT2000,LCT2000Suf,RCT2000,RCT2000Suf,LCB2000,LCB2000Suf,RCB2000,RCB2000Suf,LCT1990,LCT1990Suf,RCT1990,RCT1990Suf,LAssmDist,LElectDist,RAssmDist,RElectDist,SplitElect,LSchlDist,RSchlDist,SplitSchl,LSubSect,RSubSect,SanDistInd,MapFrom,MapTo,BoroBndry,MH_RI_Flag,XFrom,YFrom,XTo,YTo,ArcCenterX,ArcCenterY,CurveFlag,Radius,NodeIDFrom,NodeIDTo,NodeLevelF,NodeLevelT,ConParity,Twisted,RW_TYPE,PhysicalID,GenericID,NYPDID,FDNYID,LBlockFace,RBlockFace,LegacyID,Status,StreetWidt,StreetWi_1,StreetWi_2,BikeLane,BIKE_TRAFD,ACTIVE_FLA,POSTED_SPE,Snow_Prior,Number_Tra,Number_Par,Number_Tot,Carto_Disp,FCC,ROW_Type,LLo_Hyphen,LHi_Hyphen,RLo_Hyphen,RHi_Hyphen,FromLeft,ToLeft,FromRight,ToRight,Join_ID,SHAPE_Leng from LionLinesFeat0AW"  LionLinesFeat0AW.shp

    When we do for real, we'd likely preprocess out even more by rolling the filter in Step 5 into the query. This first pass has been pretty ad hoc to try get a handle on the problem.

  4. Because this produced duplicate geometries wherever line starts overlapped, I used QGIS's Delete duplicate geometries function to get rid of duplicates.

This has some drawbacks. First, it takes a really long time. But more importantly, it didn't allow me to record cross-streets in the node. So we have a bunch of street=100th St nodes and no way to differentiate other than ID fields. Ideally, we'd run a Postgres query that deduped AND added a cross_street field. This will bring up an issue where multiple streets converge. Not sure what to do there.

  1. I then loaded the deduped nodes into CARTO, table here and realize that I also need to buffer AND filter out yet more with this SQL:
    SELECT 
    ST_Buffer(nodes.the_geom_webmercator,29) as the_geom_webmercator,
    ST_Buffer(nodes.the_geom,29) as the_geom,
    nodes.cartodb_id,
    street,
    streetcode,
    streetwidt
    FROM 
    lionendnodestyp0awdeduped as nodes
    where rw_type in ('1')

    Client specified a 95-foot or 29-meter radius around intersections. Once we're confident that we have our final set of intersections, I assume we'd bake this into a polygon dataset.

It will still be huge and slow. The table above has about 85,000 points in it.

Also, I noted that some areas have clear noise caused by topology issues (cut up lines in teh source data), and client says this is acceptable. Feels like a risk.

Which brings us to:

Manage volume of data

  1. Client suggests filtering the total world of intersections only to those that have ever had a crash in them. This would involve storing two tables on CARTO: One of all intersections and one of intersections with a crash. the latter would be used in the app. We would need to modify the ETL script to recreate the app dataset on each import, and then we'd need to see if the result is any more tractable.

  2. Client also suggests limiting front-end app to show only top 250 intersections in any view, but if that's always 250 ORDER BY most crashes within user-selected date range, then is that really a savings?

danrademacher commented 6 years ago

And another piece of complexity I left out of this note -- she would rather OVERCOUNT crashes by assigning them to multiple nearby intersections vs undercount for any one intersection. But that seems to require a multi-key intersection field in the crashes table that is different from all teh single key geographies currently in use (where a crash is in exactly 1 borough, council, neighborhood, etc)

danrademacher commented 6 years ago

So looks like SQL by geom is here: https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view/blob/master/src/common/sqlQueries.js#L23

and the options in Filter by Boundary are here: https://github.com/GreenInfo-Network/nyc-crash-mapper-chart-view/blob/dc9815fddf08a56e24814de13f628f33ef5554e4/src/containers/FilterByBoundary.jsx#L35

danrademacher commented 6 years ago

Feels like the hardest challenges here will be:

  1. Modifying ETL script to work with new Intersection geoms, but this is all Python that doesn't touch the React App.
  2. Managing too-many-intersections problem. Obv much more enmeshed with the React app
gregallensworth commented 6 years ago

Preliminary results of investigation so far

Intersection Points

ETL

Statistical Analysis

  top 1000: 128 is the lowest crashcount
  top 2000: 89 is the lowest crashcount
  top 3000: 70 is the lowest crashcount
  top 4000: 57
  top 5000: 49
  top 6000: 43
  top 7000: 38
  top 8000: 34
  top 9000: 31
  top 10000: 28

UX Possibilities in Application

gregallensworth commented 6 years ago

Per discussion today:

danrademacher commented 6 years ago

Further info from client:

The filter for the intersections should be those where there has been a crash with injuries or fatalities. This should reduce the number to less than 12, 000

Might be worth trying the Top X again by Injury+Fatality and see where that lands for top 250,500,1000

gregallensworth commented 6 years ago

Intersections with Injuries or Fatalities

There are 22,934 intersections fitting the criteria:

That's not 12,000 but does narrow down to half of the universe of intersections.

Injuries & Fatalities, Tail Distribution

The top X intersections by injury-or-fatality crashcount, and the number of crashes at that "break"

highest single intersection: 74 crashes top 10 intersections: 56 crashes top 50 intersections: 33 crashes top 100 intersections: 28 crashes top 200 intersections: 23 crashes top 500 intersections: 17 crashes top 1000 intersections: 12 crashes

Method was: SELECT MIN(crashcount) FROM (SELECT crashcount FROM nyc_intersections WHERE crashcount IS NOT NULL ORDER BY crashcount DESC LIMIT 100) ccs

Number of these intersections with only 1 or 2 crashes (last 36, inj/fat): 12,129

So again with a very long tail: It tapers from 74 and 56 at the top 10, then slowly into the 30s, 20s, and teens over the next 1000, then another 11,000 as we get down to the 1-crash and 2-crash intersections. This could give a narrative as to the 10 most dangerous intersections, but does not make it easy to pick a clear break where after the top X intersections we can meaningfully ignore further results.

Injuries & Fatalities, Intersection Tally by Minimum Crashcount

Number of intersections with at least X crashes: at least 50 crashes: 12 intersections at least 30 crashes: 88 at least 20 crashes: 321 at least 15 crashes: 701 at least 10 crashes: 1708

Intersections with Fatalities

There are 385 intersections fitting the criteria:

Now we're getting into area which may be feasible, to render via a GeoJSON export and for a picklist that someone could use. But as pointed out, the client is interested in injuries since a) medical care means fewer fatalities, while b) injuries can be life-changing and therefore of significant interest.

danrademacher commented 6 years ago

Final approach:

Add intersections as a new kind of "BOUNDARY AREA" to Map and Chart views.

  1. Limit INTERSECTIONS to anywhere a crash has ever occured.

  2. Buffer the INTERSECTIONS to 95-foot radius, creating a polygon boundary.

  3. Use either spatial query or join table to allow many-to-many relationship between intersections and crashes.

  4. The available intersections will be the top 500 based on injuries+cashes over last 36 or 24 months (time period TBD).

  5. Name the intersections as and

gregallensworth commented 6 years ago

https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/pull/10

This PR adds to the ETL script, a new function to update the nyc_intersections table's crashcount field. This is the tally of crashes with an injury or fatality within the last 36 months.

The top 500 records here, will form the basis of the intersections as described above.

SELECT * FROM nyc_intersections
WHERE crashcount IS NOT NULL
ORDER BY crashcount DESC
gregallensworth commented 6 years ago

Map side

Working A-OK:

Interesting / Needs Work:

Not related but possible usability benefit:

clhenrick commented 6 years ago

This comment belongs more on the map repo but I'm adding it here as that seems to be where the intersections discussion is taking place.

It seems like Intersections are a much different geography type than contiguous polygon boundaries and that they warrant their own special handling.

Instead throwing a 1,000 tiny polygons on the map and cramming intersection names into the side of the UI next to the filter by area buttons, why not just make a separate UI for them that lists them so that they're readable and easier to interact with / find?

There could be a search input with a dropdown below it that allows for a user to type in the name of an intersection and/or select one from a drop down. When the app loads, the names of all the intersections could be fetched to create the dropdown list which would be hidden until a user starts typing in the input. A text search in the input could filter the names displayed in the dropdown as the user types. When an intersection name is selected from the dropdown or by hitting "enter" on the keyboard when the input is focused, a network request would be made for the crash data and the map would zoom to the area of interest.

The search input and dropdown could live in the upper left of the page, similar to a search input for geocoding an address.

A couple downsides with this approach I see are

  1. It doesn't live in the filter by area panel and may be inconsistent with the map's filter UIs

  2. It could be confused with an address search rather than intersection search

  3. It might be difficult to know which intersection I'm selecting as intersection names are not unique

For #1 I think this is justified because intersections aren't really boundaries, so it's odd for them to live under filter by boundary. For #2 I think either having an explicit "search by intersection" or even a toggle that switches between searching by "intersection" and "address" if address geocoding is a desirable feature to have at some point in the future. For #3 the borough name and perhaps neighborhood name or zipcode could be listed next to the intersection name to help clarify.

Currently selecting intersections on the map is problematic. When I enable intersections by clicking the button in the UI and am at the default zoom level of the map (zoomed out pretty far), it's not apparent that anything happens / I can't see the intersections on the map. Only when I start zooming in do I actually see the intersections appear, so the "select by clicking" only method has some usability issues.

danrademacher commented 6 years ago

The best solution here would probably be to replace the boundary buttons on the map with the more flexible module we use in the chart UI. That would address UX issues you highlight, and which we’d already considered, but without introducing an entirely new element.

But client priority at present is seeing Intersections in chart views, so we needed to take a minimalist approach to adding them to the map. I expect we’ll get another bite or two at this apple.

clhenrick commented 6 years ago

Re-purposing the list UI from the chart view would be helpful, but how would that solve the "select by clicking polygon on the map" for intersections? For other geographies having the list and being able to click on the map would be a usability improvement for filtering by a geographic area, but with intersections the polygons would still not be visible on the map at the default zoom level, which would still leave the inconsistent UX with how other geographies can be selected. Furthermore many intersection polygons overlap with one another which makes selecting an intersection area confusing.

Listing the intersections without other information such as borough / neighborhood / zipcode would also make their selection via a list cumbersome and confusing as intersection names are not unique, and adding that extra info to only intersections would still mean cramming it into a small side bar.

Having a separate UI and UX for searching by intersection would be more "natural" to how one searches for an address in Google Maps and allow for more screen real estate to display other geographic info to an intersection such as the borough name.

gregallensworth commented 6 years ago

f3855dc brings the basics of charting intersections.

image

image

Still to do:

Then Map-Chart integration:

gregallensworth commented 6 years ago

33e4324 brings to the Compare chart, a fix to the SVG height calculation, in the event that an area has 0 injuries/kills. The filtering and tally would return undefined which would then form a NaN instead of a number.

This happens only with very small areas such as Intersections, and so was easily overlooked: no precinct, neighborhood, etc. would have ever had 0 injures/kills in any single one-year period.

gregallensworth commented 6 years ago

Note: 520e404 reduces the max intersection count from 1000 to 500. Due to the "cloudy" nature of CARTO, the LIMIT 1000 queries were timing out over the last hour, though never did earlier in the day. I will make a corresponding change on the Map as well.

gregallensworth commented 6 years ago

Confirmed that the Chart system, when &geo=intersection and a &primary= is given in URL params, it works as expected:

This should round out everything that is expected of Intersections at this point in time. display on the map and can be charted, are labeled appropriately, etc. Let's get some preliminary approval on this from the client, then spawn some specific issues for specific improvements. You should be able to check out the gda-intersections branch to test this locally.