Mean duration factor/offset with multiple stop times/zones

leonardehrenfried commented 8 months ago

I'm currently implementing the fields mean_duration_factor and mean_duration_offset and have a question about it being part of the stop time, rather than the trip: What happens when you have multiple factors/offsets in a trip?

Let me illustrate this with a sketch:

duration-factors

Does this mean that when the vehicle is inside zone 1 then the factor is 1.5 and when it enters zone 2 the factor then increases to 2.5 for the duration it is inside of it?

And a related question: since you need two lines in stop_times for a single flex zone, what happens when those two lines don't have the same duration/offset values?

cc @tsherlockcraig @westontrillium

leonardehrenfried commented 8 months ago

And what happens when you combine this with a fixed stop and time in between two flexible areas?

leonardehrenfried commented 8 months ago

Of course this also raises the question of what happens when you're outside any of the zones (the red bit).

duration-factors(1)

tsherlockcraig commented 8 months ago

Does this mean that when the vehicle is inside zone 1 then the factor is 1.5 and when it enters zone 2 the factor then increases to 2.5 for the duration it is inside of it?

I think this is the intention, yes.

Of course this also raises the question of what happens when you're outside any of the zones (the red bit).

The red bit is theoretically covered by this comment in the spec:

While traveling through undefined space between GeoJSON locations or stop areas, it is assumed that:

MeanTravelDuration = DrivingDuration

I first came around to thinking about duration_factors being tied to polygons rather than trips when chatting with @t2gran in 2018, and it was Norwegian services like your second example with the red bit that we were thinking of to my recollection. They have demand response services that are demand response inside rural towns, but then travel along the highway at free-flow speed in between.

It occurs to me that 1) this is all a little complicated, and means that any consumer is going to need to be ready to apply multiple driving speeds to the same trip, and 2) this could be incredibly complicated (impossible to solve consistently?) if there were overlapping zones, which isn't explicitly disallowed. So, I'm interested in your feedback on whether these answers are implementable, and I'm happy to help think through further clarifications/constraints needed in the spec.

tsherlockcraig commented 8 months ago

And a related question: since you need two lines in stop_times for a single flex zone, what happens when those two lines don't have the same duration/offset values?

I think we'll need to mandate somehow that these not be different. E.g., if two stop times point to same area at same time, duration_factor et al must be the same across those stop times.

westontrillium commented 8 months ago

I think this uncovers a deeper issue at play which makes me think these fields need to be reworked.

As currently written, mean duration fields are intended to be an adjustment to the total trip time after the driving duration has been calculated by the routing algorithm (MeanTravelDuration = mean_duration_factor × DrivingDuration + mean_duration_offset). I think the assumption originally held was that these would be single values for a single "flexible trip", i.e., the stop_times forming each end of the trip both refer to a single factor/offset value for the trip (an offset of 2 from one and of 2 from the other = 2 for the trip, as opposed to 2 from one and 2 from the other = 4 for the trip).

However, the existence of this feature in stop_times.txt to indicate varying trip time estimates within a single trip necessarily breaks the first assumption (that the stop_times forming each end of the trip refer to a single factor/offset value for the entire journey). Instead, the assumption now becomes that each stop_time must adjust the duration piece-by-piece (2 and 2 = 4). But what happens if you do want a universal adjustment? If you try to define one value with each stop_time record, you could only end up with an accumulation of values if the consuming app is interpreting each to only pertain to the single stop_time to which they are attached.

So we are left with two diametrically opposed scenarios:

The consuming app takes these values to represent the entire trip, meaning there's no way to add granularity per stop_time, and all values must be the same to avoid a conflict in logic.

If this is the case, we would need to establish that values for each stop_time in a trip constitute a single value (2 and 2 = 2) and introduce a conditional requirement that says "mean_duration_factor and mean_duration_offset values must be identical for stop_times between which travel is possible." However, this restriction is virtually the same as duration adjustment being defined at the trip level, so we'd just as well move them there, as was originally the case with Flex v1 (with this restriction in place, it'd be a logical impossibility to have any two stop_times with different mean duration values within the same trip anyway).

Or

The consuming app takes these values to represent each stop_time forming both halves of the trip cumulatively, meaning there's no way to assign a generic offset for the whole trip without reckoning values for each individual stop_time.

If this is the case, we would need to add clarification to the spec that each value is only representative of the stop_time to which it is assigned (2 and 2 = 4). We would also need to add guidance on how these values should be calculated, but in my opinion option 2 gets impractical pretty quickly. Is the total trip 1.5 times longer if pickup is in Zone 1 or is just the portion of the trip that is within Zone 1 1.5 times longer? If it's the former, producers will have to divvy up the factors/offsets amongst the stop_times so they can resemble the total offset when combined. If it's the latter, consumers will need to include in their routing algorithm a function determining what percentage of the trip travel in Zone 1 takes up and only apply the 1.5 multiplier to that section. Producers will also have to factor this in providing these values. I won't even begin to speculate how this would work with fixed on-demand stops. This level of complexity seems inappropriate given that these are already static estimates. If there really is large variation on extra travel time between two ends of a trip depending on the location, then producers should just take the average of those variations and that becomes the offset/factor for the entire trip (it is a mean value after all 🙂).

You could probably read my bias for Option 1 here, specifically for moving these fields to the trip level, but I'm interested in how others would approach these challenges. I've been working on this problem off and on all day, which has made my head spin a bit, so apologies if my comments are confusing. 😅

leonardehrenfried commented 8 months ago

It would make the implementer's life (mine!) a lot easier if it would apply uniformly to a trip

In particular, the combination of these factors/offsets together with fixed times/stops would make this horribly complex to implement correctly and even slower to execute than it already is.

Also, @tsherlockcraig's case of overlapping zones would also be very hard to decide and we'd need to define some set of precedent rules.

However, making it easy to consume should not be a deciding factor in making a decision if we feel that we are gaining something worthwhile.

Maybe we should take a step back and think about what we are trying to achieve. Do we want to give producers precise control over what times are displayed in a travel planner in complex scenarios or do we simply want to give them a (admittedly blunt) tool to correct overoptimistic travel times? If it's the latter then trip level values would be enough.

leonardehrenfried commented 8 months ago

As always, it's probably a good idea to think about use cases.

The one use case I hear is that these trips pick up and drop off passengers along the way and therefore direct driving duration is overly optimistic.

tsherlockcraig commented 8 months ago

tl;dr I prefer option 3 (proposed below) but recognize that it is almost certainly not feasible at this time, I'd propose option 1 but that we think forward a bit and make it possible to allow for option 3 in the future if it becomes feasible.

(MeanTravelDuration = mean_duration_factor × DrivingDuration + mean_duration_offset)

agree that this description in these fields implies they're consistent for a trip

However, the existence of this feature in stop_times.txt to indicate varying trip time estimates within a single trip necessarily breaks the first assumption (that the stop_times forming each end of the trip refer to a single factor/offset value for the entire journey). Instead, the assumption now becomes that each stop_time must adjust the duration piece-by-piece (2 and 2 = 4). But what happens if you do want a universal adjustment? If you try to define one value with each stop_time record, you could only end up with an accumulation of values if the consuming app is interpreting each to only pertain to the single stop_time to which they are attached.

Working through your process, I think these claims are overstated. I think it's possible for the spec to be clarified to help producers and consumers understand how to interpret differing values across stop times. We could describe an algorithm to turn different factors/offets into a formula for a complete trip time.

(with this restriction in place, it'd be a logical impossibility to have any two stop_times with different mean duration values within the same trip anyway).

I agree that option 1 is a valid path forward to simplify this problem, but I don't think this statement is true and option 1 doesn't totally remove the problem--you could have two trips connected by a transfer_type 4 which had different values for these fields.

2. The consuming app takes these values to represent each stop_time forming both halves of the trip cumulatively,

This is actually somewhat different and simpler than I was conceptualizing has how multiple factors/offsets would be used. Every DRT "trip" represents a physical trip through time and space which the consuming app is necessarily aware of. Our spec already assumes this through the suggestion of consumers separately knowing "DrivingDuration", which they're calculating themselves based on data not provided by the consumer. Presumably, this involves mapping a driving trip across the street grid, aggregating their own knowledge of distance and travel speeds along every way traversed in the trip.

The "Option 3" i would propose is that duration factors/offsets apply 'within the time and space indicated by the stop_time'. When you are 'inside' a stop_time, its factor/offsets prevail; when you are not inside a stop_time, factor and offset equal to 1 and 0 respectively by default. This gives consumers a process to break up a trip into infinitesimally small segments that have only one factor/offset applied to them, if we make the declaration that factor and offset can only be different if stop times do not overlap in either space or time.

I get that that's complicated, but it's not mathematically that complicated, and it opens up the opportunity to model reasonably precise trip times that are pretty faithful to some of the key use cases, such as the rural intercity service that deviates inside a town but otherwise speeds along the highway, or a service that slows down midday because there's many more shared trips during certain times.

I won't even begin to speculate how this would work with fixed on-demand stops.

I agree that this is a very tricky use case that would need it's own caveat in any alteration of the spec that maintained factors/offsets at the stop_time level. We'd need to basically declare an exception that in the case of stop groups, the factor/offset refers to all travel including between geometries.

Another odd use case is deviated-fixed, where in between two drt legs, travel time should obey the fixed-route travel times, and for either drt leg, it's actually very reasonable and easy for the consumer to work with 2 different factor/offset pairs.

My suspicion is that Option 3 is not implementable in the budget that has been allocated to this project, and as for WSDOT's current interest in this development, I'm fine with us taking the course of Option 1. But I think we should ponder for a bit and think about 1) whether the option 3 i've proposed is feasible, and 2) if so, take any reasonable actions in terms of spec definition that we can now to allow for a future spec change that would allow for these values to be set at the stop_time level.

westontrillium commented 8 months ago

I agree that option 1 is a valid path forward to simplify this problem, but I don't think this statement is true and option 1 doesn't totally remove the problem--you could have two trips connected by a transfer_type 4 which had different values for these fields.

I could be mistaken, but I tried working out an example where you could have differing mean_duration_... values within a single trip under the restriction that any two stop_times within that trip between which travel is possible must have identical mean_duration_... values, but I couldn't do it. Even if travel between two given stop_times is not possible, they necessarily will share a relation to another stop_time in the trip. For example:

Zone A pickup
Zone B drop-off
Zone C drop-off
Zone D pickup
Zone D drop-off

Travel between Zone B and Zone C is imossible, however Travel from Zone A to both Zone B and C is possible, so whatever shared mean_duration_... Zone A and Zone B have must also be shared with Zone C. And this goes on ad infinitum; because travel from Zone A to Zone D is possible, Zone D's mean_duration_... would have to match Zone A's, which also matches Zone B and Zone C, etc....

Regarding two trips connected by a transfer, I don't think having different mean_duration_... values between the two is a problem since their travel time is already being calculated individually and then added. Each trip's DrivingDuration would be factored with its respective mean_duration_... values individually and then the total of the two would be added. Formula: MeanTravelDuration = (Trip1 mean_duration_factor × Trip1 DrivingDuration + Trip1 mean_duration_offset) + (Trip2 mean_duration_factor × Trip2 DrivingDuration + Trip2 mean_duration_offset)

The "Option 3" i would propose is that duration factors/offsets apply 'within the time and space indicated by the stop_time'. When you are 'inside' a stop_time, its factor/offsets prevail

OK I'm seeing the vision for this a little better. Minor but important detail, the spatial temporal restriction would only need to be an "AND" requirement ("if two stop times have geographic AND temporal overlap, mean_duration_factor and mean_duration_offset must be the same across those stop times").

As long as we also allow mean_duration_... values to be defined at the trip level so as not to impose this much granularity wholesale, I could see this being a way forward.

But I think we should ponder for a bit and think about 1) whether the option 3 i've proposed is feasible, and 2) if so, take any reasonable actions in terms of spec definition that we can now to allow for a future spec change that would allow for these values to be set at the stop_time level.

I agree. I understand the usefulness for the added granularity, it's just a practicality issue for me for both producers and consumers. Perhaps at some point some testing can be done for an "Option 3" setup so we can find out how this works in practice.

As usual, more heads working through this problem would be nice, but I do think moving the mean_duration_... fields to trips.txt is a good bet right now, and it still leaves the possibility to add them to stop_times.txt later (featuring the hit classic "values in stop_times.txt take precedence").

leonardehrenfried commented 8 months ago

I was a bit afraid to suggest what Weston does because it looks like I'm creating an easy way out but that was my first thought.

tsherlockcraig commented 8 months ago

Regarding two trips connected by a transfer, I don't think having different mean_duration_... values between the two is a problem since their travel time is already being calculated individually and then added.

I'm not sure it really matters based on where it looks like we're headed, but I don't think we know for sure that it's true "travel time is already being calculated individually and then added". A producer could hypothetically patch together any trips that obey increasing departure/arrival times through transfer_type 4 arbitrarily, and a consumer could hypothetically be decomposing linked trips into larger trips or even entirely ignoring the concept of "trip" as it is in GTFS within their own data model. If we want to guarantee that the pickup and drop off have exactly the same offset/factor, I think we would technically need to define these values at the feed level.

I don't think we should do that. Rather, I think we should move the values to the trip level to remove the vast majority of the potential for conflict, and then add additional clarification by way of some assumptions about how these fields should be applied. We'll have to work through some issues (deviated fixed as discussed above) but those clarifications shouldn't need to be all that verbose.

(featuring the hit classic "values in stop_times.txt take precedence").

Yeah it likely won't be a big deal to amend the spec (if stop_time level values ever become feasible/necessary) for this reason.

I'm comfortable with us moving forward with the "option 1" direction, but would request that @leonardehrenfried consider during OTP implementation whether there are opportunities to plan the development in such a way that we allow that specific platform to be adapted if the spec changes in the future. But of course that's a conversation for a different repository.

tsherlockcraig commented 8 months ago

I can try to put together a PR next week.

leonardehrenfried commented 8 months ago

From the OTP side I can confirm that supporting what I've drawn in the two sketches would be possible, even in the budget.

However, this ignores quite a few problems and doesn't deal with a few cases (scheduled deviated for example) which we would need to work on further down the line both in terms of spec as well as OTP work.

eliasmbd commented 6 months ago

Check out this comment in the PR

MobilityData / gtfs-flex

Mean duration factor/offset with multiple stop times/zones #78