Figure out volume -> miles calculations

bicyclingplus / atp-bc-tool-analysis

Analyzing inputs/outputs from the CTC Active Transportation Benefits/Costs tool, to identify and investigate potential issues (analysis notebooks only - input/output data is not included)

0 stars 0 forks source link

Figure out volume -> miles calculations #6

Open mRaffill opened 1 year ago

mRaffill commented 1 year ago

Currently, the demand (number of people at each segment/intersection) is converted to miles traveled in the project using https://github.com/gautama-bharadwaj/volume_to_miles

I'm not really sure how this works and the explanation in the technical documentation is kind of confusing, so I'm going through the calculations slowly and trying to figure out what is going on.

Trying with some random numbers:

say there is an average of 10 intersections per mile (0.1 miles per intersection) and the distribution is 0.2 miles - 2 intersections - 10% of people 0.3 - 3 - 10% 0.4 - 4 - 50% 0.5 - 5 - 20% 0.6 - 6 - 10% So then sum of (percentage of people * number of intersections) would be 0.2+0.3+2+1+0.6 = 4.1 So this looks like it gets the weighted average number of intersections each person walks through, weighted by how likely they would be to walk that far.
say there are 100 people total across all of the intersections 100 people/4.1 intersections ~ 25 Oh, so then this means that because each person traveled an average of 4 intersections, they were counted at 4 intersections in the total volume. That must be what the docs that there are really only 25 "unique" people, each counted an average of 4 times.
Then use this new volume and the distribution of travel to find the total miles traveled. (25 people 10% of people 0.2 miles traveled) + (25 10% 0.3) + (25 50% 0.4) + (25 20% 0.2) + (25 10% 0.6) = 8.75 miles

My questions:

Would the average miles per intersection really always be accurate/consistent? What about projects with weirdly distributed intersections?
Would it be possible to find "unique people" somehow using the distribution percentages directly without averages?
- I tried looking at the averages over the weekend to see how much they vary from the individual miles per intersection, I'll update github and add in some of the graphs this afternoon.
- I did see there are some projects with zero intersections -> infinite average length/intersection. This is probably the same issue that Peter didn't select some intersections in the project because they didn't have any new infrastructure. I think what we said to resolve this would be to just separately select all of the intersections adjacent to selected ways (and vice versa).
Does this take into account people who cross an intersection but then turn onto some roads not included in the project?

But overall, this seems like it actually does make sense, it just divides the total volume by how many times each person will be counted to remove double-counting, then uses this adjusted volume to find the total miles traveled. I'll try to write a better explanation to add to the technical documentation.

mRaffill commented 1 year ago

So the main thing I'm still not sure about is how this deals with people who cross through an intersection but in the perpendicular direction to the project? I tried to make a diagram of this (each color is one unique person, traveling through multiple intersections). It seems like people going through only one intersection because they're "crossing" the project, not traveling through it, would be undercounted. Travel model It seems to work when there is only one direction Copy of Travel model It seems to work for any directions selected, but not for the directions left unselected multi-dimension travel model

^ Each time, the number of people in the un-selected direction is divided by three from what it really is - because those people are actually each only showing up once in the selected intersections, but then they're divided by three to remove double-counting

Basically, it seems like if there are people going in a direction which isn't selected in the project, they aren't actually counted multiple times, so dividing by the average # of intersections traveled will result in them being undercounted.

But is the tool already doing something to take this into account? I seem to remember something about people crossing intersections in different directions or something. It doesn't seem to be in the volume-to-miles code, but maybe in the technical documentation somewhere??

mRaffill commented 1 year ago

So there is something which estimates some proportion of the volume goes through the intersection vs turns in different directions, which is used to add bicycle volumes to intersections and pedestrian volumes to segments:

Bicycle volumes on roadways (links) and pedestrian crossing volumes at intersections (nodes) are estimated directly from the models of existing active travel (see Section 4). Since bicycling volumes are predicted on roadways (links), bicycling volumes at intersections need to be interpolated. The tool assumes that each bicyclist travels through the adjoining intersections and since turn directions are unknown, it assumes half of the roadway volume is expected to cross through the adjoining intersections (i.e., each bicyclist passes through and is counted on two adjoining roadways).

Pedestrian volume is predicted at intersections (nodes). This prediction is of all intersection crossing volumes (but not right turns since pedestrians do not cross the intersection to turn right). Leaving out right turns at intersections may be appropriate for walking since right turning pedestrians have little traffic exposure risk. However, when interpolating pedestrian volume from intersections to adjoining roadways, the tool assumes all pedestrians use two adjoining roadways and so doubles the volume and distributes that volume equally across adjoining roadways.

(from the technical documentation)

So it looks like there are already some assumptions about how people turn at intersectiond and what proportion of bicyclists/pedestrians travel a certain direction. But I'm not sure whether this is incorporated into the inputs of the volume -> miles calculations?

The volume->miles uses demand, not exposure, which I think means it divides the pedestrian volumes to take into account people crossing an intersection multiple times. But even after that, seems like there should still be different calculations for the proportion of people who is crossing the intersection and staying in the project vs the proportion who only cross through at that one intersection and then travel outside of the project. Maybe it would be possible to use similar assumptions that all pedestrians cross through two of the connected segments, and then split up the volume based on how many of the connected segments are selected in the project?

(I haven't looked at how this works for segments at all yet - come back to that later)

mRaffill commented 1 year ago

Ok, I found a document from 2022 in the Box folder "Final_Comments_BC_Tool.docx" which I think clarifies this a bit further

According to this document, the difference between demand and exposure is that demand uses SWITRS data in the prediction, exposure does not. So I guess it doesn't actually have to do with pedestrians crossing an intersection in different directions?
This text in the technical documentation refers to the exposure, not demand. Basically how the exposure is distributed through the network from the original predictions.
- Bicycle exposure is distributed to intersections by using half of the volume (? not sure I fully understood this part)
- Pedestrian exposure should be distributed to ways by first multiplying by 4/3 then doubling (slightly different from what is in the technical documentation - it looks like this is some kind of feedback/comments notes, so maybe not actually implemented yet?)

I'm struggling to find where exactly this calculation is done (so I can see what data field it outputs to) - the closest thing I could find was:

'Aggregate_crashes.R' for pedestrians
'Predictions_intersections.R' for bicyclists Which do look like they distribute exposure between links/intersections, but I don't see where the "inflate the exposure by 1/3 then double it" mentioned in the technical documentation and Final_Comments_BC_Tool is calculated. These scripts looks like they just:
divide the intersection volume by the number of adjacent ways for pedestrians
add the total exposure on adjacent ways for bicyclists

I do seem to remember seeing this calculation somewhere either in the benefits documentation or the Box at some point. So maybe I just haven't looked in the right place yet.

mRaffill commented 1 year ago

Anyway, I can probably figure that out later. The issue I am thinking about more has to do with the miles distribution that this calculation uses to remove double-counted people. The distribution includes distances, but not directions. But it seems like it should only be used to divide the fraction of people who move along the same path as the project and are then counted multiple times.

So that would be something like

number of people in the project * (number of ways in project/total number of ways connected to project intersections) -> people counted multiple times
divide by average # of intersections traveled -> "unique people"
then add that to the remaining number of people in the project (who weren't double-counted) -> total number of "unique" people
then this new total number of people percentage of people miles traveled to get total miles traveled

mRaffill commented 1 year ago

Concerns:

I realize we were going to select all of the adjacent intersections to selected roadways, and presumably also select all of the adjacent roadways to intersections. So then how could we tell whether the next intersection (past the auto-selected roadway) is or isn't selected? If it isn't selected, then some people may have gone in that direction and not been selected multiple times, if it is, they may actually be counted multiple times and should be divided.
- maybe there's a way to see whether the nodes connected to adjacent roadways are selected or not?
- but also would the miles traveled by those people somehow be affected by the project? like
How would this work with more than one intersection/intersections with something other than 4 sides? How does it work for people who are walking along the project most of the way but then turn off somewhere in the middle? How does it work for projects not in a straight line or in multiple fragments?
- The whole idea of just dividing up people evenly into directions and seeing whether they will stay in the project seems very hacky and probably inaccurate. And I still need to figure out if this is actually causing problems in the numbers or whether it is addressed somewhere else.
Individuals are going to be traveling through the intersection, so should take into account what direction they came from as well as where they are going?
How does the distance categories change for people changing direction? The code currently sets any distance category which is longer than the total project length to be equal to the project length, so that people walking farther than that will just be considered as walking through the entire project. But couldn't the project length vary by what direction its in? It could be 5 miles along one road but then have a branch somewhere in the middle which goes only 1 mile. If someone walks a total of 5 miles but turns on to that branch halfway through, they might actually walk out of the project area in less than 5 miles. So the tool would assume that they were counted at more project intersections than they actually were.
What about segments/bike? Currently the tool doesn't even use this weighting for bicyclists?

mRaffill commented 1 year ago

But I guess people who go in a different direction would also be walking less in the project (eg if you cross at one point on the project and don't travel along it at all, that would be only 1 intersection traveled in the project) so less miles traveled? Unless the miles traveled in that direction are also influenced by this new infrastructure and should be counted somehow?

Trying to plot this out, but I'm getting even more confused now. What if people start in the middle of the project? Wouldn't the number of intersections to travel through the entire project then be smaller? (even without looking at directions) It all makes sense until some people start going past the project boundaries in some way or another, and I'm not sure how to capture all the different ways that could happen.

mRaffill commented 1 year ago

Maybe use the double and distribute volume - then find what percentage of volume ends up on un-selected ways That would give something like: Intersection 1: 50% exit project Intersection 2: 33% exit project Intersection 3: 25% exit project

So this would give the percentage of people who might exit the project at one specific intersection But there could still be people who travel multiple intersections in the project and then turn in a different direction - so they would still be counted multiple times, just less than expected. It seems like ideally there would be some kind of "exiting project" distribution of what percentage of people travel 1 intersection before turning, 2 intersections before turning, etc which could then be combined with the overall intersections traveled distribution.

The percentage of people who travel 1 specific intersection and leave at the next specific intersection would be: percentage of people who don't turn at intersection 1 * percentage of people who do turn at intersection 2 ... But how could I get that for all of the intersections and then combine them to an overall percentage?

There's probably some math or CS technique to solve this, I just don't know what it would be. Maybe I should look online a bit.

mRaffill commented 1 year ago

Potential solution:

Add a new weight: proportion of unselected ways/total ways
Multiply average intersections by the new weight and then do everything else the same
Compare long, linear projects (less double-counting) vs dense, connected projects (more double-counting)
Select all intersections/ways that touch the project for demand calculations only
Changes how these parameters are calculated in the backend, but it should hopefully not change the results too much

mRaffill commented 1 year ago

First, I want to experiment with some more hypothetical/made up situations, to see if this logically makes sense in a few cases where I know the real "answer." I tried this yesterday with two long, linear projects that I made up. I set it up such that that everyone only traveled in a straight line and no one turned in a different direction just to make things simpler to begin with (I do want to try this with people turning in different directions at some point but that starts getting very complicated for this very simple model).

(the first example project from above)

Test new travel model adjustment (percentage selected ratio) - initial simple scenario(1)

Percentage of ways in vs out of project: 3/5
Weighted average intersections: 3*3/5 = 1.8
"unique” people = 12/1.8 = 6.67 (close - there are actually 6 people!)

~~I also tried another method where the percentage of selected links at each intersection is multiplied by intersection volume and added up, then only that percentage of the volume is divided by intersections traveled while the remaining people are not divided by anything. Percentage of people “in project”: (⅓)(5)+ (1)(3)+(1)(2)+(1)(2) “Unique”: ((⅓)(5)+(3)+(2)+(2))/3 intersections = 2.88 Other people not double-counted: 12-((⅓)(5)+(3)+(2)+(2)) = 3.35 Total: 2.88 + 3.35 = 6.23 people~~ But this doesn’t make sense because then you would also need the different percentages of people who travel partially outside the project but then partially in the project for 1 intersection, 2 intersections, etc which gets complicated.

More links added. I also made each intersection have equal proportions of people going in each direction (1/2 and 1/2 for the 4-way intersections, 1/3 and 2/3 for the 3-way intersections) because I think that is similar to the assumption the tool makes? (I think the exact process is that the volume is doubled and distributed equally to the connected links, which does assume people are equally as likely to travel in any of the directions.)

Copy of Test new travel model adjustment (percentage selected ratio) - long_1 dimension, even distributed

Percentage of ways in vs out of project: 4/14
Weighted average intersections: 4/14 * 20/7 = 0.816
"Unique" people: 24.5 This is obviously not right (there are actually 14 people total). It doesn’t make sense that the average intersections is less than 1 because no one counted in the project could travel less than 1 intersection in the project.

mRaffill commented 1 year ago

The the real average intersections that are traveled within the project boundaries (number of intersections in the project * fraction of people): 1(11/14) + 4(1/14)+3(1/14)+2(1/14) = 10/7 = 1.428 intersections So the “weight” multiplied by the original average should have really been (10/7)/(20/7) = 1/2 (seems like a very nice number - I wonder if that is something important)

mRaffill commented 1 year ago

Maybe this is because only using the percentage in the project treats the percentage out of the project as 0 intersections. But people crossing through will still travel at least 1 intersection in the project (or else they wouldn’t be counted at all). So maybe multiply that by 1 and add to the average. Percentage of ways in vs out of project: 4/14 Weighted average intersections: 4/14 20/7 + 10/14 1 = 1.530 (real average in-project? 1(11/14) + 4(1/14)+3(1/14)+2(1/14) = 1.428 intersections) Unique people: 13.066 ~ 14 This is closer!

The main issue I see with this is again, people may travel partially outside of the project but then also travel more than 1 intersection in the project... I need to test it with some examples where people do really turn and travel a combination of multiple intersections inside and outside of the project.

mRaffill commented 1 year ago

Maybe this is because only using the percentage in the project treats the percentage out of the project as 0 intersections. But people crossing through will still travel at least 1 intersection in the project (or else they wouldn’t be counted at all). So maybe multiply that by 1 and add to the average.

Trying for sample 1:

Percentage of ways in vs out of project: 3/5
Weighted average intersections: 3 3/5 + 1 2/5 = 2.2
"unique” people = 12/2.2 = 5.45 (close - there are actually 6 people!)

The "real" average traveled inside the project in this case should be: 1(3/6) + 4(1/6)+3(1/6)+2(1/6) = 12/6 = 2 intersections So the “weight” multiplied by the original average should have been: 2/3 (again, very nice round number...?)

mRaffill commented 1 year ago

With people turning/changing directions (lines showing their paths)

Test new travel model adjustment (percentage selected ratio) - long_1 dimension, even distributed, turns

Existing method (direct average):

Weighted average intersections: 20/7
"Unique" people: 8.4

Method 1 (just multiply by the % of selected links):

Percentage of ways in vs out of project: 4/14
Weighted average intersections: 4/14 * 20/7 = 0.816
"Unique" people: 29.4

Method 2 (multiply average by the % of selected links + 1 * % of deselected links):

Percentage of ways in vs out of project: 4/14
Weighted average intersections: 4/14 20/7 + 1 10/14 = 1.530
"unique” people = 24/1.530 = 15.68 (somewhat close - there are actually 14 people. but definitely much more accurate than 29 from the first approach, and the 8.4 originally)

The "real" average traveled inside the project in this case should be: 1(8/14) + 4(1/14)+3(1/14)+2(4/14) = 23/14 = 1.643 intersections. Seems fairly close to the result from method 2, just slightly larger because some people are traveling outside of the project but then traveling more than 1 intersection within the project. So the “weight” multiplied by the original average should have been: (23/14)/(20/7) = 0.575 (not as much of a nice round number this time)

mRaffill commented 1 year ago

For testing with the real project data:

This may be challenging because the data I have doesn't include "number of adjacent segments" or anything similar.
Also a lot of the intersections in the middle of projects aren't selected (so they will have very low volumes) So to test this more effectively there would probably need to be some changes from Matt.

But I can start testing a few projects by manually counting the number of adjacent segments:

Long/linear projects:

1) 64b0406741e08c5dff327a1f Adjacent ways selected: 79/106 = 0.745 (but not including adjacent to intersections that aren't selected) Avg intersections: 5.128 Adjusted avg intersections: 4.077

2) 64921e2f1930d10600997fd9 This one actually doesn't have pedestrian volume for some reason, even though there are some intersections selected??? Anyway I can at still look at the average intersections. Adjacent ways selected: 28/41 = 0.682 Avg intersections: 2.776 Adjusted avg intersections: 2.213

Connected/multi-dimensional projects:

1) 645582371c8b985d1be43a07 Adjacent ways selected: 19/41 = 0.463 Avg intersections: 3.546 Adjusted avg intersections: 2.180

2) 64dfb34bcb0d64389a5ea11e Adjacent ways selected: 14/28 = 0.5 Avg intersections: 3.979 Adjusted avg intersections: 2.489

mRaffill commented 1 year ago

Weirdly, the long corridor projects I chose have a higher ratio of selected adjacent ways?

Maybe that's because they have fewer turns and dead ends? Or it could be because these are two-way roads so it is more likely to still be in the project when crossing (crossing to the other side of the road is included as a selected way)? Also these big corridors might have fewer full 4-way intersections, and just fewer intersections in general. Maybe it is more likely that people stay on them because they don't really have opportunities to turn anywhere else?

Maybe what I really need to compare is a similar kind of connected street grid with only one road selected vs an entire connected grid-like project. These differences could be affected by other factors.

mRaffill commented 1 year ago

Eg. there are many intersections like this:

Where the two-way road means there are 5/7 or 4/6 ways selected, while there would be 2/4 on a road represented by only 1 way. That might be causing the higher ratios for the linear projects (and might actually make sense in that case?).

mRaffill commented 1 year ago

More (better) projects to test: Some of these have 0 intersections selected (so 0 pedestrian volume) but I can at least test what the new adjustment factors would look like, assuming all of the intersections in the project were selected. (Which is more consistent anyway)

651b01899a0c762a2b50accf / 64addb0641e08c5dff327a16 (basically full grid) 49/67 = 0.73 65454061b5bfdcb11540ec4f (part of grid) 31/55 = 0.56

65284187772d22a2108ddc4c (only one road) 44/78 = 0.56 650c95319a0c762a2b50acbd /130 651714169a0c762a2b50acc7 /__

(Fill in these numbers as I count them)

mRaffill commented 1 year ago

So the main question this is trying to test is whether the simple ratio of selected ways to total ways a good (or reasonable) estimate of how "connected" the project is or how likely people are to stay in the project for most of their travel.

I think it does seem to work for the extreme cases (full grid vs only one road):

for 65284187772d22a2108ddc4c the ratio is around 1/2 which makes sense because about half of people could turn off at each intersection.
64addb0641e08c5dff327a16 the ratio is much higher around 3/4 which makes sense (at least that it is greater than 1/2) because most people will stay in the project except at the edges where they could leave.

I'm not sure how well it works for all the other types of project layouts in between. And are definitely other issues like: 64de9d4fcb0d64389a5ea118 - disconnected/multiple pieces in the project. People might exit the project in one area and then enter again somewhere else in a different part. 65284648772d22a2108ddc4d - two parallel streets, people crossing through one side might also be likely to cross through the other side as well, but just taking the simple ratio of selected/unselected ways wouldn't capture that.

But it seems to make intuitive sense at least as a kind of very basic "chance of leaving the project"?? And it is not supposed to be a very thorough solution anyway so it seems decent given the simplicity.

mRaffill commented 12 months ago

Summary of requested change:

Currently, the code calculates a 'distribution_den' which is the weighted average number of intersections traveled (sum of number of intersections * percentage traveling that number of intersections)
We want the tool to calculate the percentage of ways adjacent to selected intersections that are selected.
Then multiply the average ('distribution_den') by this ratio
Then add the the percentage of ways not selected

The idea is something like: $A*(\frac{I_s}{I}) + 1-(\frac{I_s}{I})$ where $A$ is the current calculated average $I_s$ is the number of selected adjacent ways $I$ is the number of total adjacent ways

The rest of the calculations are all the same, just adjust the average intersections traveled based on the ratio of ways selected in the project.

mRaffill commented 12 months ago

For bicyclists:

counted on segments instead of intersections
turning would be represented by being counted on adjacent ways to the adjacent intersections
so the same approximation of "percent leaving project" would be the same
but it seems unlikely that people would turn on to the project for one segment, then immediately turn off. At least in the simplest case where people cross at only one intersection - those wouldn't be counted at all for bicycling.
however it does seem likely that a person would bike partway along the project and then turn and leave, based on how many adjacent segments are selected
so there does need to be some adjustment factor. Let's try using the same factor as pedestrian and see if it seems logical in a few example scenarios.

Other than that, does it make sense?

miles traveled -> segments traveled
Weighted average - Bicyclists counted on some average number of segments
Average adjusted based on people leaving the project/turning (this is the part I still need to figure out)
Total Bicyclists/average count per bicyclist => unique single counted bicyclists This all makes sense!

mRaffill commented 12 months ago

Couldn't you also multiply length of segment * number of bicyclists? That would be the total miles traveled within the project? And the same thing for pedestrians after distributing pedestrian volume to segments? Oh but that wouldn't work because it wouldn't give the number of miles people travel in total (including outside of the project). So yes we do need to get unique people and then multiply by the original miles traveled distribution, which will include travel outside of the project.

mRaffill commented 11 months ago

Going into more detail about how this would work in combination with the tool automatically selecting the adjacent ways and intersections:

Pedestrian

When measuring the percentage of adjacent segments that are selected, if all of the segments adjacent to selected intersections, the percentage would just be 100%.
However, if adjacent segments aren't auto-selected, then some intersections with no segment improvements around them would have 0% adjacent segments selected
So it really seems to make more sense to look at adjacent intersections for pedestrians, because that won't change depending on what segments are auto-selected, and intersections are where pedestrian volumes are estimated so the next intersection a pedestrian travels to determines whether they stay in the project or not.
These intersections should include the auto-selected intersections adjacent to segments, because they are included in the total volume. Otherwise you could have projects with many segment improvements but no pedestrian estimates because there are no intersection improvements.
Including the auto-selected intersections shouldn't mess up the selected percentage of adjacent intersections, because the adjacent intersections to these auto-selected intersections won't change at all.

Bicycle

For bicycles, the percentage should still be adjacent ways, because they are only counted at ways.
The auto-selected ways includes all of the ways adjacent to the project, not just those clearly "between" selected intersections. But the total bicycle volume includes the volumes from all of these auto-selected ways anyway so it makes sense to include them in the percentage calculation.
So I guess then the tool should get the % of selected ways adjacent to any of the selected ways, both user selected and auto-selected.
Although how necessary is this adjustment factor if all of the turning directions off of intersections are already selected?

adjacent intersections_ways(1) (A visual I made to mentally process this. Green is auto-selected ways, orange is auto-selected intersections, yellow is original user-selected)

mRaffill commented 10 months ago

Before going on vacation I had tested a few projects manually with this new adjustment factor. I just finished setting that up and pushing to github now. Some quick notes from testing the first project (may still be weirdnesses) The adjustment factor here seems to have a huge effect on the average intersections traveled, which might be concerning...? Also the average intersections traveled is ending up as less than 1, which means that the weighted volume is greater than the inputted volume. Is this ok? I need to think about this further. Anyway, it seems like the volume -> miles calculation is not fixed yet.

This is currently adjacent segments to intersections. It may be different if I try adjacent intersections to intersections.
May also be that intersections are very far apart and so there are less than 1 intersection per mile (although that intuitively seems a bit unreasonable, even for somewhere with really few crossings)