mean_duration_factor,mean_duration_offset,safe_duration_factor,safe_duration_offset conditionally forbidden-->conditionally required?

MobilityData / gtfs-flex

NOTICE: GTFS-Flex has been merged to GTFS. This repo is no longer up-to-date and will deprecated. Consult the google/transit repo for the up-to-date info.

https://github.com/google/transit

Apache License 2.0

119 stars 24 forks source link

mean_duration_factor,mean_duration_offset,safe_duration_factor,safe_duration_offset conditionally forbidden-->conditionally required? #73

Open tsherlockcraig opened 1 year ago

tsherlockcraig commented 1 year ago

I noticed in reviewing a feed and the specification today, that we have the fields mean_duration_factor,mean_duration_offset,safe_duration_factor,safe_duration_offset indicated an conditionally forbidden, but optional otherwise.

Should these values be conditionally required in the case that the stop_time references a flex service location?

If not, we should state in the spec what the assumed default value is. The current spec mentions a default value between locations, but not a default value within locations.

I'm fine with either solution, though i lean towards making these conditionally required.

westontrillium commented 1 year ago

What would be the condition of their requirement? Because I'm not sure we would want to make these required fields. Could you also point me to the section of the spec referring to a "default value between locations?"

For what it's worth, these fields aren't included in the proposed base implementation, and while in some cases they may provide more accuracy to trip time estimates, I personally would not mourn their ultimate exclusion in the spec. These values tend to be fairly arbitrary. The only factor to consider as to why a trip may take longer than a private vehicle is if it is a shared ride service and there are deviations to be made before the user's destination. Such deviations usually vary greatly, as you have two factors: the number of deviations and the distance of those deviations. Perhaps trip time estimates should be left to the domain of real trip time estimates under GTFS-OnDemand...?

As a side note, looking at the reference.md, the current language actually looks incomplete: We now allow single points to have a start/end_pickup_drop_off_window, so the field(s) that triggers the allowance of offsets/factors (and indeed, the field that determines whether something is considered "Flex" at all) should be start/end_pickup_drop_off_window, not a stop_times.stop_id referencing a locations.id/areas.area_id.

tsherlockcraig commented 1 year ago

The current spec reads:

While traveling through undefined space between GeoJSON locations or stop areas, it is assumed that: MeanTravelDuration = DrivingDuration

which is the same thing as saying that the default values between locations are 1 for factor and 0 for offset.

These fields are important for sensible trip planning without realtime information, and should be included in the proposed base implementation. If it is true that 'These values tend to be fairly arbitrary.', that's only because producers haven't taken the time to thoughtfully populate them--which is indeed a problem we've seen in feeds as a consumer in the Hopelink project, although it's been a low priority issue relative to others that we've not contacted those producers about yet :). (Granted, I understand there's a communication problem with actual operators to determine values, but that's a surmountable problem we all just haven't gotten to solutions for yet.)

'Perhaps trip time estimates should be left to the domain of real trip time estimates under GTFS-OnDemand...?' I don't see how that's tenable, unless we're just not going to do trip planning for these services unless there's real-time data. Without this information, the only guess a consumer can make is that trip time = drive time as in the assumption above, and in a practical sense that means that demand response transit will always look faster than transit, which isn't the case.

As a side note, looking at the reference.md, the current language actually looks incomplete: We now allow single points to have a start/end_pickup_drop_off_window, so the field(s) that triggers the allowance of offsets/factors (and indeed, the field that determines whether something is considered "Flex" at all) should be start/end_pickup_drop_off_window, not a stop_times.stop_id referencing a locations.id/areas.area_id.

sounds fine to me!

westontrillium commented 1 year ago

Aha, I see what you mean. Yes, I agree the spec probably should include a statement about the default value being factor=1, offset=0 for any Flex service that doesn't have those fields defined.

If it is true that 'These values tend to be fairly arbitrary.', that's only because producers haven't taken the time to thoughtfully populate them

I guess I just don't see how any amount of thoughtfulness short from basing these values off of historical trip data provided by agencies could not still be, at best, subjective estimates (perhaps "arbitrary" is a stronger word than I intended), and I suppose that is where my discomfort comes from, that is, codifying such subjectivity into the spec. Maybe a solution is to include as a best practice that these fields should be based on actual trip time analysis?

The practice I have personally been involved with is to ask agencies to provide these values, and while representatives of those agencies may have a great deal of experiential knowledge of their operations, those estimates are still not justified with anything concrete. I guess the core question I'm asking here is what holds greater risk: including this kind of guesswork in a dataset or not including it at all? Because a trip time estimate without this data may just as likely as not match actual travel time the same as one augmented by subjective, static values may or may not match actual travel time.

the only guess a consumer can make is that trip time = drive time

I'm not convinced we can make that assumption yet. The very nature of these services (public transit, shared ride, etc.) tells the user, in an implicit way, that this trip behaves differently than a private vehicle trip. We do also have the utility of the message fields at our disposal where a producer could choose to include the disclaimer that it is a shared ride service and that actual trip time may vary from the estimate provided.

demand response transit will always look faster than transit, which isn't the case.

Maybe not always, but when you're looking in terms of raw trip time (actually, do we agree trip time should start when the vehicle starts moving? i.e., it should not include time between booking and/or arriving at a passenger's pickup location, nor boarding time, correct?), then many times it would and should look faster than fixed-route, depending on the mode.

In the end, I really don't have any sort of vendetta against the mean/factor offset fields. I am just pondering whether it really would be a great loss if these are excluded in a base implementation if it means there is an easier path to adoption (a scenario which does not preclude discussing their eventual inclusion at a later date).

But to the actual question at hand (😆), I am still curious as to what you envision would be the condition of their requirement, or is it a general requirement for Flex, i.e., a service using start/end_pickup_drop_off_windows is the condition?

tsherlockcraig commented 1 year ago

But to the actual question at hand (😆), I am still curious as to what you envision would be the condition of their requirement, or is it a general requirement for Flex, i.e., a service using start/end_pickup_drop_off_windows is the condition?

Not quite that general, since we allow those to reference a fixed stop now. Rather, any reference to stop_areas.area_id, or id from locations.geojson.

I agree that it's a messy situation, that's only really solved by real-time information and the estimation by the producer of a specific trip time window for a specific trip. But, we also have the need to provide the data consumer with a discrete way to estimate travel time, and we know that travel time is related to, but not the same as drive time. And we know that there are trips we want to provide in trip planners over the coming 5-10 years for which we just aren't going to have real time information. Ax+B is as I see it the least bad solution here. It's significantly more accurate than x (or log x) and easier to communicate than Ax^2 +Bx + C (to joke, kinda). I did have back and forth with Leo back in the day about whether it should just be x + B and, I didn't like it as much but that solution is also probably feasible if we wanted to simplify the communication/subjectivity.

We do also have the utility of the message fields at our disposal where a producer could choose to include the disclaimer that it is a shared ride service and that actual trip time may vary from the estimate provided.

Consumers would have the option (and should use it) to provide an interpretive interface. That's in addition to the need to make some type of assumption about travel time, if travel time will be represented in the system.

Maybe a solution is to include as a best practice that these fields should be based on actual trip time analysis?

I like this as a "best practice". Technically, i don't think it's inappropriate to guess. The subjectivity is a problem, but a relatively minor problem in most contexts. In 2030 or so, no applications will be willing to provide demand response information without a real-time API that internally calculates expected travel time based on complex business rules, then delivers a response to the consumer. But, in the meantime we're going to guesstimate based on a reasonably simple formula because we want to let people know that they should plan on that flex connector to pick them up 25 minutes before their train trip even though the station is only a 12 minute drive away, and scale that math to a variety of scenarios without getting too complicated.

Yes, I agree the spec probably should include a statement about the default value being factor=1, offset=0 for any Flex service that doesn't have those fields defined.

I'm also perfectly happy with this solution. Seems easy enough to just conditionally require it so producers have to stop and think about it, so they know how travel times will be represented. But, stating the assumption would successfully do that in a different way.

tsherlockcraig commented 1 year ago

storing here that when I submit a PR for this, I should also update the example file(s) to ensure include data for these fields. (maybe not strictly necessary if we choose to just define the "default" values in the spec).

tzujenchanmbd commented 1 year ago

@westontrillium @tsherlockcraig thanks for your thorough discussion!

(maybe not strictly necessary if we choose to just define the "default" values in the spec)

I agree that we don't need to change presence to conditionally required if we specify empty values are considered as 1 & 0 in spec (since we still allow empty value here - not required).

Regarding whether it should be included in the first increment (Service Discovery) - MobilityData aims to establish a strong base for consensus and sustain community momentum through an incremental approach. Therefore, we prefer the first increment to be as simple as possible, only focusing on essential functionalities. Based on past experiences, including more fields in an increment can often complicate discussions. If we were to include these fields, the community might discuss the estimation approach (Ax+B, x+B, or others), making the adoption more time-consuming.

I agree that providing producers with an estimation option (not required) for duration would be beneficial, especially considering it's possibly not pretty soon to have the realtime portion. But if the goal of first increment is "service discoverability", these 4 fields are probably not must-have at the moment. In addition, although these fields are not included in the first increment, producers and consumers can still use these experimental fields in their implementation.

Based on the reasons mentioned above, we recommend separating these 4 fields and discussing their official adoption in the future increment, after the first increment is officially adopted.

tsherlockcraig commented 1 year ago

Thanks, Tzu-Jen. I see this conversation differently in two regards.

First, I think it's important that we come to this conversation recognizing that it has been going on for years and has gone through many increments, many of which but not all of which MobilityData has been a part of. This began as a spec proposal back in 2013. The first working app integrating "GTFS-flex" data in a trip planner was launched in 2018. GTFS-flex "v2" was developed in collaboration with MobilityData in 2019. The GOFS working group adopted GTFS-flex as the base of GOFS in 2021. We're not talking about a first increment, but rather, the first public vote on the spec. There are already at least 2 GTFS producing applications and at least 2 GTFS consuming applications, and at least 3 of those applications support these fields (I can't speak for Transit app, interested to hear from them). "the community" has provided all sorts of input over years and is still welcome to, but we're not just starting an incremental approach to sustain community momentum--the community has been doing that for a long time, though appreciates MobilityData's additional advocacy.

an estimation option (not required) for duration

Second, I'd like someone to walk me through how this isn't required for trip planning. Every trip planner today offers duration for trips, right? Is there any exception to this rule? If there is not, this is or another approach to duration is required-->otherwise we would be intentionally leaving something fundamental to the trip planning experience ambiguous. If there are exceptions-->then let's look at them! What does trip planning without duration look like? I can't imagine it.

It sounds to me like MobilityData's approach is adding complexity here rather than decreasing it. The GTFS-flex spec proposal, while certainly imperfect, meets the producer and consumer requirements of specification incorporation process, and should simply be brought to a vote as is. In that vote, or before then on this repo, anyone is welcome to propose changes to the spec, and should. If there are no specific proposed changes that need to be made, and the community votes for it, its in! I don't see the strategic value in making this more complex. We've worked on this for a decade, and it's getting close-->let's go for it and then work on incremental improvement and the incorporation of real time.

tzujenchanmbd commented 1 year ago

Thanks for your feedback Thomas! Let me try clarifying the technical part here and we can discuss how to adopt the proposal (approach) in tomorrow's meeting :)

Second, I'd like someone to walk me through how this isn't required for trip planning. Every trip planner today offers duration for trips, right? Is there any exception to this rule? If there is not, this is or another approach to duration is required-->otherwise we would be intentionally leaving something fundamental to the trip planning experience ambiguous. If there are exceptions-->then let's look at them! What does trip planning without duration look like? I can't imagine it.

I think we all agree including a statement about the default value being factor=1, offset=0 in spec is a good direction. From my understanding, the "default value" means if there is no valid value (i.e. empty value) on the record, it will be considered as 1 & 0 automatically.

100% agree that every trip needs duration, my thought is -

since we have specified default value in spec, there is always duration for every trip
since we probably want to allow empty value, then perhaps don't need to change presence to Conditionally Required

The current definition of Required presence is - the field or file must be included in the dataset and contain a valid value for each record. As mentioned in previous comment, we can add recommendation in Best Practice like "recommended to include these 4 fields and should be based on actual trip time analysis", but we probably still want to allow empty value for these fields when producers just want to use the default value. If we change it to Conditionally Required, does it mean producers need to explicitly include value 1&0? What is the condition for requirement here?

tsherlockcraig commented 1 year ago

since we have specified default value in spec, there is always duration for every trip

Thanks! I think we're getting on the same page, although I'd clarify one element here. We haven't fully specified default values in the spec at this time. We've only defined that

While traveling through undefined space between GeoJSON locations or stop areas, it is assumed that: MeanTravelDuration = DrivingDuration

which only implies what the defaults are, and only in certain circumstances. I think we need to update that part of the field (also for the safe_ fields) to read something like

"The default values for mean_duration_factor and mean_duration_offset are 1 and 0, respectively. This applies both inside GeoJSON locations or stop areas, as well as in between GeoJSON locations or stop areas. These default values imply that when there are no declared values for mean_duration_factor and mean_duration_offset, that MeanTravelDuration = DrivingDuration"