bicyclingplus / atp-bc-tool-analysis

Analyzing inputs/outputs from the CTC Active Transportation Benefits/Costs tool, to identify and investigate potential issues (analysis notebooks only - input/output data is not included)
0 stars 0 forks source link

Evaluate crash model - why is it estimating hundreds/thousands of crashes? #5

Open mRaffill opened 11 months ago

mRaffill commented 11 months ago

Originally posted by @mRaffill in https://github.com/mRaffill/atp-bc-tool-analysis/issues/2#issuecomment-1721765131

To-do list:

mRaffill commented 11 months ago

In 433385f14c6230da535079f25055ca166e1715e5 I added linear regressions for Project length and Project length + volume, and graphed some scatter plots comparing crashes to length and crashes to volume.

Next steps:

mRaffill commented 11 months ago

Maybe look directly at the ECCmojvf compared to Ljvf and Vmj - this might be easier to compare functional class/volume class

mRaffill commented 11 months ago

After trying a bunch of combinations, the most I've been able to find is that there seems to be a much higher ratio of number of intersections to crashes than there is for number of segments/length of segments to crashes. It almost seems like there is some constant multiplier somewhere which is making intersection crashes much higher than segment crashes?

(add examples/graphs later)

mRaffill commented 11 months ago

Per Dillon's suggestion, I started looking at one individual project and going through each step of the crash equation to see where it started seeming "off."

mRaffill commented 11 months ago

The total crashes after adding up crashes for each intersection and crashes for each segment are much lower than the crashes calculated using total lengths/counts/volumes and the functional/volume classes equation (what is used currently in the tool). They both use the same initial data from each segment and intersection - so there must be something different in the process to calculate crashes as values are added up across all of the segments/intersections.

I looked again at the tables used to calculate crashes to try and find differences between calculating for individual intersections/segments and calculating for functional/volume classes. The alpha constants are the same. Multiplying by the total length vs multiplying by the individual lengths and then finding the sum should be the same.

But it seems like there might be an issue with volume. Each segment/intersection has individual volume (pedestrian/bicycle exposure), but the aggregate version uses the total exposure for that mode - not separated by functional class and volume class. So the crashes split by volume and functional class will use the same volume over and over again... I think?

And then since that is then multiplied across all of the segments/intersections (eg. the total count or the total length), it will be like all of the segments/intersections had the total volume instead of their individual volume.

Looking at the equations, this does seem to be the case: $EC{cmoj} = \sum{f}\sum{v}ECC{cmojvf}$ $ECC{cmojvf} = e^{Ɑ{mojvf}} L_{jvf} (EV_{cmj})^{p}$

So $EC{cmoj} = \sum{f}\sum{v}e^{Ɑ{mojvf}} L_{jvf} (EV_{cmj})^{p}$

But $(EV_{cmj})^{p}$ is not split by $f$ or $v$ so it is getting duplicated every time

Two options I can think of: $EC{cmoj} = (EV{cmj})^{p} \sum{f}\sum{v}e^{Ɑ_{mojvf}} L_{jvf}$ Multiply volume separately

$EC{cmoj} = \sum{f}\sum{v}e^{Ɑ{mojvf}} L_{jvf} (EV_{cmjvf})^{p}$ Split volume (since it comes from the segment/intersection properties, it can also be split by volume and functional class)

or just use the approach of calculating crashes for each individual segment/intersection that I tried right now

My question is is my approach valid and the existing approach wrong? or is the inconsistency because my approach has something missing?

mRaffill commented 11 months ago

I also wonder if part of this has to do with the number of intersections selected - in the tool, multiple corners/parts of an intersection are often selected. For example, this one intersection in the individual project I was looking at has 8 sub-intersections selected, which will each be counted as separate intersections when the tool estimates crashes.

Screen Shot 2023-09-22 at 1 27 38 PM

But I'm not sure how intersections were counted to get the average crashes/intersection for the state level metrics (for calculating the alpha constants). When those constants were calculated, did one intersection mean the entire intersection or just one side of the intersection?

mRaffill commented 11 months ago

In 433385f I added linear regressions for Project length and Project length + volume, and graphed some scatter plots comparing crashes to length and crashes to volume.

Adding results (some subset of them which I think are the easiest to understand) here for reference:

Intersections

image (combined across all modes, outcomes, only for network intersections)

Mode / Outcome: bicycling walking combined
crash image image image
injury image image image
death image image image

Roadways

image (combined across all modes, outcomes, only for network segments)

Mode / Outcome: bicycling walking combined
crash
injury
death
mRaffill commented 11 months ago

So using the approach calculating separately at each segment/intersection (https://github.com/mRaffill/atp-bc-tool-analysis/issues/5#issuecomment-1730729158, https://github.com/mRaffill/atp-bc-tool-analysis/issues/5#issuecomment-1730864833) across all projects:

Roadway Intersection
Before image image
After image image

Sorry for the bad formatting with all of the weird x-axis labels! Also note that I don't think they're in the same order of Project IDs so they don't necessarily line up. But I think it shows pretty clearly that after taking out the duplicated volume, the crashes become much more like what is expected.

There are still some things which look a bit confusing, like bicycling change in crashes being greater than combined change in crashes (probably has to do with the crash reduction factors).

mRaffill commented 10 months ago

Uh oh... I tried calculating using the equation $EC{cmoj} = (EV{cmj})^{p} \sum{f}\sum{v}e^{Ɑ_{mojvf}} L_{jvf}$ and either I've done something very wrong or this wasn't the issue to begin with, because I am still getting the same thousands of crashes (the results actually look almost identical)... image

Compared to what is currently in the tool: image

Did I implement something wrong? Is the equation itself wrong or is some other issue with the crash model that this change doesn't address?

mRaffill commented 10 months ago

After going around in circles trying to figure out what was wrong with either the equation or my code implementing it, I started to wonder if it is really valid to multiply the volume outside of the summation and I think that it isn't valid.

Since the existing volume $EV_{cmj}$ is also just the sum of the volumes from each volume/functional class:

$EC{cmoj} = (EV{cmj})^{p} \sum{f}\sum{v}e^{Ɑ_{mojvf}} L{jvf}$ $EC{cmoj} = (\sum{f}\sum{v}EV{cmjvf})^{p} * \sum{f}\sum{v}e^{Ɑ{mojvf}} * L_{jvf}$

This will have cross-multiplication between the volume and Ljvf e^alpha terms. Volume from all volume/functional classes will be multiplied by length e^alpha for all volume/functional classes, instead of only terms with the same volume/functional class. (something like the product of sums is not the same as sum of products)

So I think only the second option for the equation would be valid:

$EC{cmoj} = \sum{f}\sum{v}e^{Ɑ{mojvf}} L_{jvf} (EV_{cmjvf})^{p}$

dtfitch commented 10 months ago

Isn't that last equation what I originally wrote that was also producing wild results?

mRaffill commented 10 months ago

Do you mean the equation that the tool was originally using?

I think the equation in the documentation is $EC{cmoj} = \sum{f}\sum{v}e^{Ɑ{mojvf}} L_{jvf} (EV{cmj})^{p}$ vs this one has volume also split by volume class (v) and functional class (f) $EC{cmoj} = \sum{f}\sum{v}e^{Ɑ{mojvf}} * L{jvf} * (EV_{cmjvf})^{p}$

So the total volume (across all volume/functional classes) then won't be used multiple times and duplicated. Volume comes from the individual segment/intersection properties so it seems like the only real change would be making the process of adding up the volume numbers slightly different.

However: I tried calculating this way but the results are still much larger than calculating for individual segments/intersections and then finding the total. It looks like they have a very similar same pattern of results (shape of the graph) but this approach is scaled much larger. So this equation could have more issues, or the individual segment/intersection method might have issues, or I'm just missing a constant somewhere.

segments intersections
individual segments or intersections image image
by volume or functional class image image

Anyways I am still very confused about these equations so I'll look at it again tomorrow/over the weekend and hopefully get to understanding them better.

dtfitch commented 10 months ago

okay, I see the difference. It does seem like there is some normalizing constant that is missing somehow. Thanks for continuing the dig!

mRaffill commented 10 months ago

I thought about this more and have a different idea: The equation for the individual segments/intersections where w is the individual intersection or segment number/id would be $EC{cmoj} = \sum{f}\sum{v}\sum{w}e^{Ɑ{mojvf}} * L{jvfw} (EV{cmjvfw})^{p}$ Since alpha is constant across all segments/intersections in the same volume or functional class, this is equivalent to: $EC{cmoj} = \sum{f}\sum{v} e^{Ɑ_{mojvf}} (\sum{w}L{jvfw} * (EV_{cmjvfw})^{p})$

However, the tool currently adds up the volume and length/count across all intersections/segments separately and then multiplies them together: $EC{cmoj} = \sum{f}\sum{v} e^{Ɑ{mojvf}} (\sum{w} L{jvfw}) (\sum{w} EV{cmjvfw})^{p}$

It seems like these equations are not equivalent, again because of the "cross multiplication"/distributive property when multiplying two sums. That might be why the results are so different when adding up the terms in different ways.

I can't mentally process all of these summations to figure out the differences, so I tried writing what terms would actually be added For one volume, functional class combo (so crashes/intersection/person or crashes/mile/person is constant): (crashes/intersection/person * 1 intersection * people) + (crashes/intersection/person * 1 intersection * people)... does not equal (crashes/intersection/person) * (people + people + people + ....) * (1 intersection + 1 intersection + 1 intersection + ...) **does** equal (crashes/intersection/person) * (1 intersection * people + 1 intersection * people + ...) (crashes/mile/person * miles * people) + (crashes/mile/person * miles * people)... does not equal (crashes/mile/person) *(people + people + people + ....) * (miles + miles + miles + ...) **does** equal (crashes/mile/person) * (miles * people + miles * people + ...) So basically, it looks to me like this is cross-multiplication again in the volume/functional class approach? But the alpha constants were initially made from volume/functional classes. So is it actually valid to use the alpha constants this way to calculate crashes at each individual segment/intersection? (ignoring the e^ and ln for now because those just cancel out) (average crashes/miles or intersections)/(total volume)^p (total crashes/total length or count)/(total volume)^p ((crashes + crashes + crashes + ...)/(1 intersection + 1 intersection + 1 intersection + ...))/(people + people + people + ...)^p ((crashes + crashes + crashes + ...)/(miles + miles + miles + ...))/(people + people + people + ...)^p But these individual internal calculations don't matter because the whole point is to get the AVERAGE across the entire state. So I think it should be reasonable to calculate at individual segments or intersections? (crashes/intersection/person) * (1 intersection * people + 1 intersection * people + ...) ((average crashes/miles or intersections)/(total volume)^p) * (1 intersection * people + 1 intersection * people + ...) Now it seems like another issue is how to deal with the ^0.5 exponent for volume and how that distributes over a volume class vs an individual segment/intersection???
mRaffill commented 10 months ago

Sorted min-max: Roadways image Intersections image

mRaffill commented 10 months ago

Now it seems like another issue is how to deal with the ^0.5 exponent for volume and how that distributes over a volume class vs an individual segment/intersection???

We discussed this and thought it might be reasonable to apply the safety-in-numbers constant at the individual segment/intersection level, because the literature this comes from does have a "micro-scale" constant. Actually, the constant the tool is using may not even be correct, but before looking into what would be the correct constant to use, Dillon suggested graphing how much the crashes change when the constant changes.

I could graph this pretty easily, but it looks like the constant does have a big impact on crashes. Even a small change like increasing from 0.5 to 0.6 results in almost twice as many crashes.

safety in numbers constant segments intersections
0.1 image image
0.2 image image
0.3 image image
0.4 image image
0.5 (current) image image
0.6 image image
0.7 image image
0.8 image image

I also notice that for very small constants, bicycling crashes are above combined crashes... Maybe this is because applying the constant after adding bicycle+pedestrian volume isn't the same as the total bicycle+pedestrian volume with the constant already applied?

mRaffill commented 10 months ago

Oh, but changing the safety in numbers constant should also change the alpha constant - the equation to calculate alpha includes the safety in numbers constant: $\alpha = \ln(\frac{crashes}{V^{p}})$

So I probably have to go back to that excel file where all of the alpha constants were calculated and also try changing the safety in numbers constants there, and then recalculating crashes with the new alpha constants.

dtfitch commented 10 months ago

Good catch. This variation should be strong though. An exponent of .4 is a 60% reduction from safety in numbers alone.

mRaffill commented 10 months ago

Right, makes sense! I guess it was just surprising seeing that visually.

Taking into account the change in alpha constant, variation is less extreme:

safety in numbers constant segments intersections
0.1 image  image
0.2 image  image
0.3 image image
0.4 image image
0.5 (current) image image
0.6 image image
0.7 image image
0.8 image image
dtfitch commented 10 months ago

Wonderful! This makes it seem like less of a scary decision to make. Okay, I think we go with the most recent citation and select 0.4. So is this what we should tell Matt to implement? $EC{cmoj} = \sum{f}\sum{v}\sum{w}e^{Ɑ{mojvf}} * L{jvfw} * (EV_{cmjvfw})^{0.4}$

Can you list out each subscript so it is clear for him? Also, we need to give him a new look up table for alpha constants, right? thanks!

mRaffill commented 10 months ago

Can you list out each subscript so it is clear for him?

$EC{cmoj} = \sum{f}\sum{v}\sum{w}e^{Ɑ{mojvf}} * L{jvfw} * (EV_{cmjvfw})^{0.4}$

c = column (safety, per capita, per jobs) m = mode o = outcome (crash, injury, death) j = location type (segment or intersection) f = functional class v = volume class w = project selected segments or intersections (based on what j is)

All of these constants are the same as the benefits calculation documentation, except w.

Also, we need to give him a new look up table for alpha constants, right?

Yes - should I make a new one with python and put it (the script and output) in github/box? Or modify the excel file where the alpha constants were originally calculated? Or some other way?

dtfitch commented 10 months ago

Great thanks! I think a new list of alpha constants in python that pulls from the data Matt has is safer than the spreadsheet.