STATS19 layer - Githubissues

robinlovelace-ate commented 1 year ago

Similar to this:

dabreegster commented 1 year ago

References... https://maps.dft.gov.uk/road-casualties/index.html, https://bikedata.cyclestreets.net/collisions/

robinlovelace-ate commented 1 year ago

Happy to help on this..

dabreegster commented 1 year ago

Glad for help! I haven't started digging in at all yet. Simplest MVP is get country-wide data in a GJ, and we can serve it using PMTiles. Step one will just be drawing points (probably using https://maplibre.org/maplibre-gl-js-docs/example/cluster/ or similar for low zoom), step two can add user-controlled filters

dabreegster commented 1 year ago

https://github.com/a-b-street/abstreet/blob/ffb1f3a8ccb47edd16f8ed3aac44345f96ff4dab/importer/src/uk.rs#L37 and related code for more reference

robinlovelace-ate commented 1 year ago

Key data science problem is to join crash level data with casualty level data. I've done that in the stats19 R package. Could be an opportunity to port that logic elsewhere, or do pre-processing R side, or something else?

dabreegster commented 1 year ago

If that problem is already solved, awesome! Can we just consume the result? Is there (or can we make) one output file England-wide?

robinlovelace-ate commented 1 year ago

If that problem is already solved, awesome! Can we just consume the result? Is there (or can we make) one output file England-wide?

Yes, happy to provide the inputs.

dabreegster commented 1 year ago

Just noting that I tried out maplibre clusters on the cycle parking layer as an example, where seeing hundreds of points at low/moderate zoom isn't helpful. It doesn't work with pmtiles, only full geojson sources, so that's a non-starter for the amount of data we have. Will be the same for stats19. Looking into https://maplibre.org/maplibre-gl-js/docs/examples/heatmap-layer/ next.

Robinlovelace commented 1 year ago

In fact: I now think that showing aggregates of crashes on the road network may be more valuable. Various people, including @wengraf, have understandably railed against representing crashes as 'dots on the map' without context or denominators.

But can be a handy and quick way, plus generating aggregate segment level stats is a hard job that would take time and perhaps PhD level research and methods of the type undertaken by @agila5: https://onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12823

Any thoughts on this Andrea from statistics/visualisation perspectives?

Robinlovelace commented 1 year ago

And cc @rogerbeecham who may have additional thoughts as a visualisation pro, what's your latest thinking on how to visualise STATS19 data on a map?

dabreegster commented 1 year ago

I haven't dove into the source data yet, but snapping points to the nearest road segment or intersection (because I recall the data distinguishes junctions vs mid-block) is not tough. Not sure if doing so makes interpreting any easier though.

We can certainly start by just rendering a bunch of dots. I quickly tried https://maplibre.org/maplibre-gl-js-docs/example/heatmap-layer/, and it works with pmtiles at least at moderate zoom levels. (I think conceptually we need to do something smarter at low zoom levels -- either throwing away performance, listing all the points, and feeding into the heatmap layer. Or maybe precalculating heatmap contour shapes for different zooms.)

robinlovelace-ate commented 1 year ago

Starting by showing the raw data as dots sounds like a good starting point. Just not sure that's the most policy relevant visualisation technique, but sounds like a good starting 'point'! Greetings from York office...

wengraf commented 1 year ago

In fact: I now think that showing aggregates of crashes on the road network may be more valuable. Various people, including @wengraf, have understandably railed against representing crashes as 'dots on the map' without context or denominators.

I ought to be editing our response to the TSC session on transport data, but let me state exactly what my issues are, as it may help:

If a map is presented in a context that allows for or encourages people to think about risk, then it must be a map of real risk. The human mind is poor at interpreting point maps and risk.
There are contexts where point maps are OK to me, but that's when people are definitely looking for population (e.g., crash response) and not at all thinking about risk (e.g., engineering interventions).
STATS19 is not all crashes - most notably, it doesn't have reliable slights, self-reported crashes are likely to be skewed at the very least, doesn't have super reliable serious, doesn't necessarily have crashes where people have done a runner (e.g., people who would rather not have a chat with the police) and doesn't have crashes on certain bits of the network road users might not understand as not applicable (e.g., private land, parks, etc.). If you then map "all the dots", certain gaps in that mapping can suggest wildly inaccurate things (like the survivorship bias Spitfire diagram).

On snapping points to links, the simple GIS part isn't too hard, but that snapping then needs to be sense checked against the fields that describe the link, and you need a road network that is of decent enough quality (e.g., Open Roads doesn't have some complex junctions) and you need a denominator applicable to that road link. The snapping is the (relatively) easy bit.

rogerbeecham commented 1 year ago

Apologies -- just wrote something and now seen @wengraf 's points. I also not sure of genesis/background to project.

Guess it depends on the use case. If the intention is to provide a front-end to the dataset, then the zoom-dependent density-based clustering, a bit like that in the MapLibre earthquake example fine -- imagine this would wor in terms of order of magnitude in stats19 crashes.

If it's to represent "risk" -- not just where there are roads and people (and so road crashes), as @wengraf says, really hard to come up with generalisable decisions on how to capture exposure/context. Don't think I've yet seen this done well in an interactive/exploratory tool. And it is seems unlikely that can capture the context identified by @wengraf above.

Sometimes useful to think about constraints, especially with these sorts of exploratory tools -- e.g. if were to limit to stats19 and a road network dataset like OpenRoads (recognising difficulty with layering with lots of context).

So a starting point could be at least to generate zoom-dependent crash counts regularly sampled over the network:

sample points over network at regular spatial intervals -- this sampling increases/decreases (stepped intervals) on zoom level
generate crash frequencies within a given spatial bandwidth , again varying at stepped intervals on zoom level

From here, possible to do things like set up some model about how the 1D distribution of these crash frequencies should look and encode deviation from this at regularly sampled points. Could then have users explore different data generating processes to emphasise different -- e.g. thinking of this Surprise Maps paper -- https://idl.cs.washington.edu/papers/surprise-maps, which is about exploring differences from expectation where no theoretically informed model exists.

robinlovelace-ate commented 1 year ago

Many thanks guys, with my ATE hat on!

agila5 commented 1 year ago

Any thoughts on this Andrea from statistics/visualisation perspectives?

Hi everyone! I didn't check all the details regarding this repository (and I'm also not sure about the background of this project), but I just want to point out that there are a few techniques that can be used to display a heatmap regarding the density of points on a linear network (and everything is already implemented in R). For example:

library(spatstat)

# simulate car crashes data on a toy network
crashes <- rpoislpp(
  lambda = \(x, y) 125 * x + 125 * y, 
  L = simplenet
)

# Not really clear, it looks like there is a constant intensity
par(mar = rep(0, 4))
plot(crashes, pch = 20, main = "")


# Much better with an heatmap
par(mar = c(0, 0, 0, 3))
plot(density(crashes, dimyx = 128), main = "")

^{Created on 2023-08-23 with reprex v2.0.2}

A couple of points:

You've already mentioned the possibility of using heatmaps to smooth point-level car crash data. I'm not sure if maplibre implements smoothing techniques on networks but I think a classical 2D approximation might be a good starting point. Happy to provide insights on the 1D readaptation (and, in fact, I just finished working on a paper on a similar topic: here)
As Ivo and Roger already pointed out, this approach has several drawbacks. In fact, the technique that I showed is just a non-parametric way to smooth the point-level data and it doesn't really display a "car crash risk" since we are missing an exposure. Unfortunately, it's quite difficult to provide general guidance on what might be treated as "exposure" in these models (population? traffic? speed?)

robinlovelace-ate commented 1 year ago

This is great stuff, many thanks Andrea, Ivo and Roger, v. helpful. We may be able to share more in due course, thanks to the amazing work of Dustin and Pete 🔥

dabreegster commented 7 months ago

Copying from chat with Robin:

Feature parity with bikedata, which is good at showing dots on the map for high zoom levels and has good filter options, but is bad at showing aggregated results,
Aggregation to link level, that is a hard problem and requires pre-processing to account for high % of crashes at junctions and therefore not clearly on any road, good paper on this: https://onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12823
Aggregation to area levels, e.g. OA / LSOA

Robinlovelace commented 7 months ago

This thread has so many good ideas, the making of the best way to interactively explore STATS19 data in a publicly available map IMO.

wengraf commented 7 months ago

I'm not I'm entirely confident in the second bullet of @dabreegster . Isn't it the case that crashes at junctions still have the primary road field? If so, part of the link matching is already done...I guess it boils down to how you prioritise geolocation from the point data relative to geolocation from the location description variables...

Robinlovelace commented 7 months ago

And the purpose of the visualisation: if we want to highlight dangerous junctions vs dangerous roads it could be handy to have a separate (point based I guess) visualisation approach for junctions rather than snapping them all to linestrings.

dabreegster commented 7 months ago

@Robinlovelace, how much detail do we want per point, for the raw point case? I'm starting with the number of casualties, the worst severity of those, and the year. Do we also want to only filter for casualties to people outside of a motor vehicle? (That complicates some stuff -- if a driver and a pedestrian were both injured, do we report them both? And ignore a collision with 2 driver casualties?)

For the area-aggregations, similar questions -- what do we want to show and let people filter by? Total collisions, total casualties, only counting peds/cyclists? etc.

The more you can help me spec out what we want in the end, the faster this'll get done

acteng / atip

STATS19 layer #280