Generate access values separately from surfaces

mattwigway commented 8 years ago

So we can switch grids easily.

It should be really fast to compute access values, it's less math than doing the isochrones. It's probably all the floating point math.

mattwigway commented 8 years ago

Actually, I misspoke. We can't compute an average access value unless we know all of the travel times, because of the partial accessibility problem. When a location is completely inaccessible for some portion of the time we cannot generate an average based on the average travel time, because the average travel time can't include those completely unreachable times (if you include infinity in an average, you won't like the results).

mattwigway commented 8 years ago

Oh wait though. I'll bet we can just store a separate array parallel to the surface with what proportion of the time each destination is accessible, and downweight by that as needed.

mattwigway commented 8 years ago

And we need to do that anyways because our averages are currently wrong when a destination is inaccessible for part of the travel time window. We currently divide by surface.nMinutes but that's only valid if a destination was reachable 100% of the time.

mattwigway commented 8 years ago

Hmm, I'm actually not sure this solves the accessibility problem though. Consider a case where the average is 59 minutes and 20% of the time it is not reachable at all. Is 80% of the jobs at this destination the same as the average accessibility if you computed it for every minute? It can't be, because some of the times when the destination was reachable are over 60 minutes, so they would be 0 in the true average accessibility calculation.

Of course the expectation may still be correct, because there is likely another destination that is sometimes reachable in 60 minutes but is on average reachable in 62, which won't be included at all if we compute average travel time and use it to compute average accessibility, weighting or no weighting.

One other point is more epistemological and relates to what we're trying to measure. Even though we're reporting the average number of jobs accessible, that doesn't really capture the true picture of things. No individual cares about how many total jobs they can reach, they care about whether they can reach their job(s) (which are small in number relative to the total number of jobs in the city). Consider these three scenarios:

I have a reliable mass transit system, running every 2 minutes, that lets me reach 100,000 jobs on average.
There are two trains I can take. One goes into City A (where there are 100,000 jobs) and the other goes to city B (where there are also 100,000 jobs). However, the first train runs only between 7-8 AM and the second train runs only 8-9 AM. Assume the trains reach all the jobs in exactly 60 minutes.
Same as scenario 2, but now there local buses that run when the trains aren't running, and take 75 minutes.

If we compute average accessibility by computing accessibility at every minute, both scenarios show that 100,000 jobs are reachable. If we compute it using the average travel time, but downweighting by percent of time the destination is accessible at all, the first is still 100,000, the second is still 100,000 (0.5 * 100,000 + 0.5 * 100,000), and the third is 0 (because the average travel time is now 67.5 minutes).

Well that kind of calls into question the whole idea of using average travel time to derive accessibility (which is also what we've been doing in Analyst up to this point, see opentripplanner/OpenTripPlanner#2148). Adding lines should never cause decreases in accessibility. So I guess that's out.

I'll have to think about this more when I'm not on an airplane.

mattwigway commented 8 years ago

Ah, what we should do is just store a weight-per-pixel for a single cutoff (e.g. 60 minutes) and then apply that to the grids as needed. This means that changing the cutoff requires calculating a new surface to get accessibility numbers (although the isochrones won't, meaning those jaw-dropping smooth animations will still work). The complexity of adding additional grids is then roughly O(1).

mattwigway commented 8 years ago

Fixed in a188788225ed8147ce47090701a8ceac1b59a555

conveyal / browsochrones

Generate access values separately from surfaces #19