MobilityData / gtfs-flex

NOTICE: GTFS-Flex has been merged to GTFS. This repo is no longer up-to-date and will deprecated. Consult the google/transit repo for the up-to-date info.
https://github.com/google/transit
Apache License 2.0
119 stars 24 forks source link

More than two flex zones in the same trip #76

Open leonardehrenfried opened 1 year ago

leonardehrenfried commented 1 year ago

While making the feed for the Catholic Community Services of Western Washington work in OTP, we hit upon an interesting edge case: what is the expected travel time for services that have more than 2 zones in their stop times which have a time window?

Let me illustrate what I mean with a sketch:

multiple-flex-zones(1)

Here we have three zones which all have pretty long time windows. In stop_times the are listed in order and you can get on or off in all zones.

Now, if you're planning a trip from Zone 1 to Zone 3 is the expectation that the trip will always go via zone 2? Or can the shortest path from 1 to 3 be used?

Since services which have more than one zone are pretty rare (I only know CCSWW) and right now the case with more than 2 zones is theoretical I'm wondering what the correct router behavior should be.

Has anyone thought about this yet?

cc @t2gran @vpaturet @tsherlockcraig @jon-campbell-ibigroup @westontrillium

tsherlockcraig commented 1 year ago

It could be either, and we don't know based on the data itself.

Because flex allows these windows to overlap and doesn't have any other data element indicating the actual expected flow of operating vehicles, the consumer of static data can't know which zones will be visited during a trip in this use case.

I don't think this matters from a product perspective. To the user, we don't need to say anything that we don't already need to say ("we don't know the path of the vehicle" is already true when we're just talking about one zone). If the software practically speaking needs to answer this question, I think any answer is acceptable as long as documented and as long as we hide that information from the user.

westontrillium commented 1 year ago

Adding to what @tsherlockcraig said, since Flex is not in the business of prescriptively defining paths of travel (because, as he said, we cannot know what the actual trip will be), the data then can only define the boundaries of what origin/destination cases are possible with the service.

In any case, on an operational level I can't think of why the bus would always (i.e., prescriptively) go out of its way to Zone 2 before Zone 3 unless there was a fixed stop it always served, in which case that could and should be reflected in the data. If the bus did behave in such a way (going to Zone 2 just to fulfill that unknown requirement), the closest thing to accommodate such a behavior in the data is adding offsets for those trips so the estimated travel time is more reflective of reality.

As a side note, if I'm interpreting correctly that the service allows any direction of travel between these zones, and intrazone travel for each, I would probably just use a trip for each case instead of grouping multiple cases in one trip just to be as explicit as possible: trip1. from Zone 1 to Zone 1 trip2. from Zone 1 to Zone 2 trip3. from Zone 1 to Zone 3 trip4. from Zone 2 to Zone 2 trip5. from Zone 2 to Zone 1 trip6. from Zone 2 to Zone 3 trip7. from Zone 3 to Zone 3 trip8. from Zone 3 to Zone 1 trip9. from Zone 3 to Zone 2

Lastly:

Since services which have more than one zone are pretty rare (I only know CCSWW) and right now the case with more than 2 zones is theoretical I'm wondering what the correct router behavior should be.

We actually do encounter a fair amount of multi-zone services with all sorts of directionality restrictions in the U.S., including ones with more than 2 zones. I'd be happy to point you to some examples if you're curious!

uwtcat commented 1 year ago

@leonardehrenfried, @westontrillium @tsherlockcraig et al. I wanted to note that this problem surfaces a deeper problem with GTFS-flex, and is not just an edge case.

The summary of the problem with the current schema is that although transitions from zones to zones can be described via trips, the schema does not enforce having the data producer provide a "completely specified graph" of the possible transitions. However, a completely specified graph is what a routing algorithm needs in order to perform any kind of routing, and those are the gaps experienced by our OTP developing friends.

When @westontrillium says, "just to be as explicit as possible"- it is that level of having to address all the transitions among zones that are allowable that is current missing from GTFS-flex v2 best practices, and also missing the enforcement of some crucial attributes to describe the time bounds on those allowable transitions. (please note that I did not say you have to specify transitions that are not allowed. So long as we require that all allowable zone-to-zone transitions are specified, anything not explicitly named can then be assumed to be not allowed. The current issue is that when something is not specified, @leonardehrenfried does not know whether to interpret it as not allowed, or allowed by default. And since scenarios are different with different providers, we should not make either one a default.)

Now I'd like to provide the more graph-specific explanation. It is squarely coming from the routing use-case perspective, which I believe to be aligned with the purpose of GTFS-flex v2 as a routing-first schema (or so I was told when I was trying to suggest some proposals that came from an accessibility-first lens)...

So-- on routing--

Routing algorithms (even for fuzzy routing like flex), fundamentally require a fully specified graph. With GTFS-static, what happens inside the "nodes" is precisely a time-bounded stop. With GTFS-flex, we can interpret zones to be "nodes" in the graph, and then what happens inside the zone is governed by the type of flex service that is provided there. The routing algorithm can break down routing to (1) first identify the routable shortest path among zones and then (2) identify some route that is flex-service appropriate within that zone.

What I'm suggesting is that GTFS-flex v2 will never, as suggested by @tsherlockcraig, provide a fully specified graph within the zone. However, even in the way we are looking at flex services, GTFS-flex v2 should require the producer to provide a fully specified graph among the zones. That is, the specific proposal I'm suggesting is that trips should provide all the allowable transitions among zones, and further- provide all the associated conditions and attributes that are required for that transition to take place- time bounds, date bounds, eligibility requirements (I've seen only eligibilities allowing travel from Zone 1 to Zone2), etc.

You can read more as to why this would be necessary from routing algorithms, coming from graph theory. But you can skip everything below if you wish.

Thanks! Anat & the Taskar team.

In graph theory, a graph is considered "fully specified" when all of its essential characteristics and properties have been explicitly defined or provided. These essential characteristics typically include: Nodes (Vertices): The graph should specify all the nodes (or vertices) that are part of it. Each node must have a unique identifier or label. In GTFS-static nodes are simply stops. I argue that for GTFS-flex, we can consider zones or locations nodes. What happens inside the nodes is subject to whatever on-demand service is provided there and does not allow for a full graph specification within that zone, but among zones, we can still hold on to a graph structure, because it is so useful in the routing context.

Edges (Links): The graph should define all the edges (or links) that connect the nodes. For each edge, it should specify the two nodes it connects and any associated attributes or weights. In GTFS-static the links are sometimes implied by the order of the stops. Trips and routes allow for some variations in the links. In GTFS-flex, there is an underlying assumption that the ordering of the zones build some implicit links among those zones. However, we have seen cases, where these allowable links between ordered pairs of zones are conditionally allowable (depending on time of day, or eligibility, or something else). Then GTFS-flex v2 sort of punts it to either booking rules or trips. This lack of clarity in where the transitions among zones are specified, and lack of clarity that they must be specified and that their associated conditions should be specified as well, creates room for error and misinterpretation between the producer and consumer. This is the reason, there should be no transitions or links between zones that are implicit by default, and this is the reason the recommendation above states: Trips should provide all the allowable transitions among zones (in the way @westontrillium provided above), and further- provide all the associated conditions and attributes that are required for that transition to take place- time bounds, date bounds, eligibility requirements, etc.

Other things required for full graph specifications: Edge Directions: In the case of directed graphs (digraphs), the direction of each edge should be specified, indicating which node is the source and which is the destination. In GTFS-static this is described explicitly in the route. In GTFS-flex v2, this is implied by the zone named first and the zone named second. So we're good there.

Any Additional Attributes: Depending on the specific application, a fully specified graph may also include other attributes associated with nodes or edges, such as labels, colors, or metadata.

I would say that this would be the role of documentation and best practices, as I understood was suggested by @tsherlockcraig in the MobilityData conversation,

Bottom line is this: In the least, a routing algorithm for GTFS-flex v2 requires a graph model of the possible transitions among zones described in a deterministic way, like a state machine. The current schema requires explicit representations of zones, but is not as stringent about its requirements for explicit representation of the edges or links between them. We ought to rectify this in order to reduce miscommunication between producers and consumers, and remove the current need for human interventions between producer/consumer to elucidate what the service actually offers.

Thanks for reading thus far.

tsherlockcraig commented 1 year ago

The current schema definitely needs further specification, but I think the needed explicit identification of edges between nodes can be attained more easily by adding a further description to stops.stop_id. I've suggested the edit on the GTFS-flex PR here: https://github.com/google/transit/pull/388/files/e359750cb2d4a496cf96bdbb1c6e30a73b3fb59f..2efafbfe2b91e0b99313df2391adb3fbc9121861#diff-3ecf0760eb54b4953728042a1e30586705dc2335807be94faae0de5829cd12a1

I'm very interested in @leonardehrenfried 's take here. In previous discussion, it's seemed like we have bugs/unfinished logic in the current OTP implementation, but a path to resolve the issues we've run into so far that's in the works. (Current PR for OTP linked in Leonard's first comment.)

leonardehrenfried commented 1 year ago

Yes, it would be desirable for consumers to have as much detailed information about the service as possible. However, a spec like flex is always a compromise between the needs of the consumer and the producer. Overall, GTFS is optimized for being easy to produce at the cost of some ambiguity for the consumers. This strikes me as the correct balance as the consumers tend to have higher technical skills than the producers and are happy to "suck up" the complexity if that means that they get any data at all.

If we put more burden on the producers that would probably mean that we get fewer data feeds.

If you want to have a richer set of tools for describing fixed or flexible transit, I encourage you to check out NeTEX. This definitely enables you to describe services to an excruciating detail, however barriers to creating these feeds are very high as the complexity is orders of magnitude greater than GTFS.

Personally, I prefer a well done GTFS feed over a poorly implemented NeTEX one. Since NeTEX is a lot more complex it's way easier to make a mess.

OTP implementation

About @tsherlockcraig's point about the implementation: yes, there are definitely gaps in what is possible in the spec and what OTP deals with well, however the majority of the current popular use cases are supported.

Nevertheless, I expect there to be a long tail of issues and edge cases that I will have to deal with. Given the complexity of the expected results, I think this is a normal process.

Regarding this particular issue, since we didn't have a service with more than one zone, OTP didn't have an implementation for it. With https://github.com/opentripplanner/OpenTripPlanner/pull/5376 this is about to change.

tzujenchanmbd commented 1 year ago

Thanks for your insightful discussion! Totally agree that we need to further clarify how to model different use cases appropriately.

Regarding where these clarifications should go: MobilityData is gradually incorporating best practices into the spec. In other words, we are gradually introducing more "best practice-like" descriptions into the spec, just like this suggestion! So if you have any other suggestions for clarifications in the spec, please let us know.

Additionally, we plan to create a flex data examples page in gtfs.org once Flex is adopted (similar to this Fares-v2 page). Please share any useful use cases/edge cases in MobilityData/gtfs.org issue#195, and we can add them on data examples page in the future.

uwtcat commented 1 year ago

Hi again.

TL;DR: There will always be tension between data producers & data consumers, but only the end-device-users pay the price for badly formed schemas. We need to produce excellent, well formed schemas, and pair them with human-usable tools to support producers with the full data life-cycle of their streams (including data collecting, vetting, validation, distribution, update and maintenance).

I'm hearing the need to "explicitly specify via a trip every allowable transition" (in the same way @westontrillium showed above) equated with "excruciating detail." That is an exaggeration. At some point, the producer needs to communicate what service they are provisioning, AND we need to give them easy to use, simple language, human-usable tools to express that.

@leonardehrenfried says "Overall, GTFS is optimized for being easy to produce at the cost of some ambiguity for the consumers." My standpoint is that the burden and cost of this ambiguity ends up being carried by the end-device-user. The traveler who ends up consuming trip-information that is unreliable, and difficult to verify because the stream followed the schema (so validation passed). Conscientious data consumers (like those creating the flex trip planner for HopeLink in King County) might take the steps to contact each producer and clarify the intent. I do not imagine all consumers will act as responsibly.

We owe it to travelers to create well specified schemas to effectively communicate precise information about flexible services. Keep in mind that on-demand transportation users (at least in the U.S.) are enriched for traveler populations that have been travel-marginalized for decades already. Instead, we should support those schemas with human-usable data tooling ecosystems (tools for data collection, validation, vetting, maintenance and public stream distribution) to support good Flex data production, even by unskilled or lower-skilled service providers.

tsherlockcraig commented 1 year ago

End-device-users also pay if data is unproduced because standards are overspecified and introduce complexity not needed to achieve the relevant degree of specificity.

There is no service that has been identified that cannot be explicitly defined within the current proposed specification (including this recent change). Is there some hypothetical service that might exist which cannot be specified, or is there an existing service which we have not considered?

leonardehrenfried commented 1 year ago

Maybe the "excruciating detail" line was a little harsh and I'm sorry if it came across as belittling your point, which I didn't want to.

However, I still stand by my comment that adding more features to the spec doesn't result in more details being provided.

westontrillium commented 1 year ago

I guess I don't see how you could avoid being explicit about what's allowable without either having invalid data or without producing an incomplete picture of the service, so I don't understand what issue is actually being raised (it's very possible it exists somewhere over my producer, non-developer head 🙂).

The current issue is that when something is not specified, @leonardehrenfried does not know whether to interpret it as not allowed, or allowed by default.

If it's not specified, it is not allowed. How could it be any other way? If the dataset says "Zone 1 to Zone 3" and nothing about a "Zone 2," as far as the data is concerned, Zone 2 doesn't exist. If the trip query falls within the parameters of what is specified, the trip is possible.

This has always been the case with Flex; it's just that as our understanding of the required logic improved, so did our understanding of how best to structure particular cases with the spec. The earliest version of Flex v2 said you can just have a single stop_time record referring to itself for both pickup and drop-off. Then Flex v2 underwent what is probably its most significant evolution, which was two-fold: The requirement for travel to be expressed in consecutive stop_times (i.e., no intrazone travel in a single stop_time record) and specifying that stop_sequence determines directionality (e.g., travel from stop_sequence=1 to stop_sequence=2 is allowed, travel from stop_sequence=2 to stop_sequence=1 is not). An earlier refinement came out of the GOFS working group as well, which determined that it is invalid to have spatial, temporal and pickup/drop-off rule overlaps all at once within a single trip_id. For example, you could have a single trip with stop_times that both reference the same zone and the same time window, but as long as they both don't allow pickup, there's no redundancy since one would be saying, e.g., "zone1, call to be picked up between 8 and 12" and the other "zone1, call to be dropped off between 8 and 12." A case where one stop_time said "zone1, call to be picked up between 8 and 10" and the other "zone1, call to be picked up between 9 and 12" would be invalid because of the spatial overlap (zone1), temporal overlap (the 9 o' clock hour), and pickup/drop-off_type overlap (call for pickup). As long as just one of these three elements does not overlap, there is no conflict for a consumer to determine which stop_time to reference.

Together, these new rules made ambiguity about what is allowed/not allowed impossible–at least impossible as far as I can determine. Taking the following example: trip_id stop_id stop_sequence pickup_type drop_off_type start_pickup_drop_off_window end_pickup_drop_off_window
tripA Zone1 1 2 1 08:00:00 12:00:00
tripA Zone3 2 1 2 08:00:00 12:00:00
tripB Zone2 1 2 1 14:00:00 19:30:00
tripB Zone1 2 1 2 14:00:00 19:30:00
tripC Zone4 1 2 1 10:00:00 17:00:00
tripC Zone6 2 2 1 10:00:00 17:00:00
tripC Zone6 3 1 2 10:00:00 17:00:00

There is only one way to interpret this set of data because of the rules we have set for Flex. Allowed:

  1. Trips from Zone1 to Zone3 between 8:00 and 12:00 are allowed (row 1 to row 2)
  2. Trips from Zone2 to Zone1 are allowed between 14:00 and 19:30 (row 3 to row 4)
  3. Trips from Zone4 to Zone6. are allowed between 10:00 and 17:00 (row 5 to row 7)
  4. Trips within Zone6 (from Zone6 to Zone6) are allowed between 10:00 and 17:00* (row 6 to row 7)

Not allowed:

*We could alternatively express this scenario with its own trip_id but it should make no difference to the consumer.

uwtcat commented 1 year ago

@westontrillium Thank you for this explanation.

Here is the explicit concern: With the proposed extension to Stop_times.txt, my understanding is that tripA is ambiguous but allowed (it is a riff on your example). There are three possible interpretations to tripA: tripB, tripC and tripD, but those three trips are not equivalent to each other.

Taking the following example:

trip_id stop_id stop_sequence pickup_type drop_off_type start_pickup_drop_off_window end_pickup_drop_off_window
tripA Zone1 1 2 1 08:00:00 9:00:00
tripA Zone2 2 2 2 09:00:00 11:00:00
tripA Zone3 3 1 2 09:00:00 12:00:00
tripB Zone1 1 2 1 08:00:00 9:00:00
tripB Zone2 2 1 2 09:00:00 11:00:00
tripB Zone3 3 1 2 09:00:00 12:00:00
tripC Zone1 1 2 1 08:00:00 9:00:00
tripC Zone2 2 2 1 09:00:00 11:00:00
tripC Zone3 3 1 2 09:00:00 12:00:00
tripD Zone1 1 2 1 08:00:00 12:00:00
tripD Zone2 2 2 1 09:00:00 11:00:00
tripD Zone2 3 1 2 09:00:00 11:00:00
tripD Zone3 4 1 2 09:00:00 12:00:00

and the reason this is associated with this particular issue that was opened by @leonardehrenfried are the questions regarding tripB tripC and tripD in my example- 1) for tripB and tripC- can stop 2 in stop_sequence be skipped by the router (I believe the answer according to the spec is yes, but should be clarified). 2) for tripD- can both stops 2 and 3 in stop_sequence by skipped by the router (same)

I believe tripA is currently allowed, but should not be. The source of the problem is that there's no current restriction in the spec to have either pickup_type or drop_off_type to be 1 (no pickup/drop off). Basically, by allowing both pickup and drop-off to happen in the same line description of stop_times, we are ambiguously implying intra-zone travel is allowed, even though it may not be. If each line in stop_times is defining a node in an allowable transition in the directed graph, it should non-ambiguously be a startpoint (with drop_off_type=1) or an endpoint (with pickup_type=1), but it cannot be both the endpoint of one transition AND the startpoint of another transition, without making that explicit.

Bottom line: we should put in place appropriate restrictions in stop_times to ensure we are discreetly describing zone transitions (edges in the graph) with no ambiguity.

westontrillium commented 1 year ago

can stop 2 in stop_sequence be skipped by the router

Yes, GTFS allows an origin/destination pair to skip stop_sequences. Your trip can start at stop_sequence=1 and end at stop_sequence=4, skipping 2 and 3. The only consideration for 2 and 3 is that a fixed route is technically able to describe the path of travel the vehicle takes between an o/d pair thanks to shapes.txt, but Flex is unable to do so (and should not, anyway). I've always understood that routing data for a Flexible trip comes from elsewhere, like the shortest path possible from the mapping data source.

I hadn't actually thought we needed to clarify skipping stop_sequences being allowable since it's a given in core GTFS, but perhaps not so with Flex?

Regarding your tripA, I would actually disagree that there are three other interpretations; in fact I think there can only be one (rules around stop_sequence indicating direction and the disallowance of intrazone travel helped make this clear):

Allowed

Not allowed

As far as I can tell, there is no other way to interpret the data if following the spec 100%.

The community have actually talked through this exact scenario before, which turned out to be an important turning point on whether or not to allow intrazone travel in a single stop_time. Suffice it to say, this case actually exists in the real world, and the current rules surrounding sequencing and intra/interzone travel actually make it easier and more intuitive to model: You can have that Zone2 stop_time work both for a drop-off from Zone1 and a pickup going to Zone3 (pickup_type=2/drop_off_type=2) while still disallowing intrazone travel within Zone2.

tzujenchanmbd commented 1 year ago

for tripB and tripC- can stop 2 in stop_sequence be skipped by the router (I believe the answer according to the spec is yes, but should be clarified).

I hadn't actually thought we needed to clarify skipping stop_sequences being allowable since it's a given in core GTFS, but perhaps not so with Flex?

Trying to capture possible clarifications here.

trip_id stop_id stop_sequence pickup_type drop_off_type
1 zoneA 1 2 1
1 zoneB 5 1 2
1 zoneC 10 1 2

Based on previous discussions, it seems when a trip planner provide an option from zoneA to zoneC, the trip planner would provide estimated travel times based on the direct way from zoneA to zoneC, i.e., "skipping" zoneB (stop_sequence = 5).

Would adding something to the spec like - "If a trip's stop_id consist only of area_id or id from locations.geojson for on-demand services, a data consumer should provide travel time based only on the origin and destination locations, without considering the locations in between" be helpful?

This previous comment also makes sense to me -

"If the bus did behave in such a way (going to Zone 2 just to fulfill that unknown requirement), the closest thing to accommodate such a behavior in the data is adding offsets for those trips so the estimated travel time is more reflective of reality."

Regarding the tripA in this comment, the current flex spec states "Travel within the same stop area or GeoJSON location requires two records in stop_times.txt with the same stop_id." I agree there doesn't seem to be ambiguity.

westontrillium commented 1 year ago

Would adding something to the spec like - "If a trip's stop_id consist only of area_id or id from locations.geojson for on-demand services, a data consumer should provide travel time based only on the origin and destination locations, without considering the locations in between" be helpful?

Adding more clarity is probably the right call. Perhaps something more like this that includes an example:

"When providing routing or travel time between the origin and destination, data consumers should ignore intermediate stop_times.txt records that have start_pickup_drop_off_window and end_pickup_drop_off_window defined. For example: trip_id stop_id stop_sequence pickup_type drop_off_type start_pickup_drop_off_window end_pickup_drop_off_window
tripA Zone1 1 2 1 08:00:00 18:00:00
tripA Zone2 2 1 2 08:00:00 14:00:00
tripA Zone3 3 1 2 10:00:00 18:00:00

Consumers should not take Zone2 into consideration when providing routing or travel time for a trip from Zone1 to Zone3. "

That way we're capturing cases with a stop_id referring to a regular stop serving in an on-demand capacity (i.e., you can be picked up/dropped off there on request within a given time window).

leonardehrenfried commented 1 year ago

I would also welcome the clarification suggested by @westontrillium and @tzujenchanmbd.

uwtcat commented 1 year ago

I agree on that clarification. In addition, should the clarification state what happens if the trip requested from Zone 1 to Zone 3 falls between 8:00-10:00? I did not see an explicit mention of that.

More explicitly, this allowable "skip" is also bound to a service time. So the transition in the graph from Zone 1 to Zone 3 is actually bound by service times that are different from the original ones mentioned for Zone 1

tripA Zone1 1 3 1 10:00:00 18:00:00
tripA Zone3 3 1 2 10:00:00 18:00:00

Thanks.

westontrillium commented 1 year ago

I don't think the "skip" is bound in that way. The windows should only apply to their own rows. You could be picked up at 09:45 in Zone1 and dropped off in Zone3 at 10:05 because each record's window corresponds to its own pickup/drop-off rules, and the travel time of this hypothetical trip is such that the origin and destination each fall within a valid stop_time record. Example:

trip_id stop_id stop_sequence pickup_type drop_off_type start_pickup_drop_off_window end_pickup_drop_off_window
tripA Zone1 1 2 1 08:00:00 18:00:00

... tripA | Zone3 | 3 | 1 | 2 | 10:00:00 | 18:00:00

stop_sequence=1 is pickup-only, so its window only refers to when the rider can be picked up, not the window within which the entire trip time must be contained. Here's what I would expect the trip planning flow is based on this behavior (using a user-defined pickup time):

  1. The rider submits a query with a desired pickup time of 09:45 somewhere within Zone1 and a destination somewhere within Zone3.
  2. stop_sequence=1 is flagged because it has the appropriate pickup_type (2), the origin location falls within Zone1, and 09:45 falls within its pickup/drop-off window.
  3. stop_sequence=3 is flagged because it shares its trip_id with stop_sequence=1 and comes after it, it has the appropriate drop_off_type (2), the destination location falls within Zone3, and the destination location's arrival time of 10:05 (determined by a calculation of the travel time from the origin at the desired departure time of 09:45) falls within its pickup/drop-off window.
  4. The rider is presented with a trip plan result with a pickup in Zone1 at 09:45 and a drop-off in Zone3 at 10:05.

Assuming the travel time is 20 minutes, if the rider were to submit the same query but with the desired departure time of 09:30, no result would return because 9:50 (20 minutes from the departure time) does not fall within Zone3's drop-off window.*

*Or, depending on the app, it may have the capability to show the next closest possible trip departing at 09:40 instead.

uwtcat commented 1 year ago

I find it unsettling that having a consistent "read" on the flex schema is dependent on the (possibly wildly) different time estimates downstream consuming routers would make regarding traversals from Zone 1 to Zone 3 at different times of day.

Completely by coincidence, one of my teams were recently using Google and Here API's to get (and compare) car traversal time estimates from different start points to healthcare facilities in WA, and got very different results. It seems to me that either we require the service providers to clarify what is their buffer (meaning, that they are then really on the hook for provisioning that pickup), or interpret the times indicated in the straightforward manner with no estimates. What if the OTP router estimates 20 minutes, but Google, with better transit data in hand, estimates it's 40 minutes at that time of day? Should the response I get back to a request for trip Zone 1-> Zone 3 at 9:30 depend on whether I'm using Google Directions or some OTP variant?

In absence of concensus on this, in the least there ought to be a clarification that "reasonable attempt at estimates should be made to buffer the time traversals between pick ups and drop offs in order to fall within the indicated stop_times."

leonardehrenfried commented 1 year ago

I understand the discomfort but isn't it the nature of this spec, which is called "flex" after all, that you cannot exactly predict how the trip is going to happen? In fact we can't even say if it's going to happen at all, just that, according to the data provided to us, it could happen.

If we knew the exact route and times we wouldn't need the spec and could use static GTFS, couldn't we?

leonardehrenfried commented 1 year ago

In absence of concensus on this, in the least there ought to be a clarification that "reasonable attempt at estimates should be made to buffer the time traversals between pick ups and drop offs in order to fall within the indicated stop_times."

If a service has these rules (buffer time between trips), shouldn't they put them in the data? There is nothing stopping a producer to only use short windows, several small areas or several trips to get the routing software to return the result they want. Or am I overlooking something?

uwtcat commented 1 year ago

There's a difference between uncertainty and ambiguity. Flex services incorporate a natural uncertainty. However, a schema should, to the best of our ability as its designers, remove ambiguity. There are plenty of examples of standards which embed uncertainty, but do so unambiguously. I realize we don't want to go there, but there are even some OGC standards which will embed the code snippet that you should use to calculate that time buffer, so everyone does the same thing. As for @leonardehrenfried's comment: "If a service has these rules (buffer time between trips), shouldn't they put them in the data?" -- ABSOLUTELY, but nothing about this is currently explained or transparent in the current proposal.

leonardehrenfried commented 1 year ago

If I had to model such a service I would create several trips with the appropriately short windows at each zone rather than a single trip with very long windows.

But most services that I've worked with are not planned to this level of detail. Most just wotk it out as they go along.

tzujenchanmbd commented 1 year ago

@uwtcat Do stop_times.mean_duration_factor, mean_duration_offset, safe_duration_factor, safe_duration_offset in MobilityData/gtfs-flex PR#74 help with "buffer estimate" you mentioned?

There is also consensus on the clarification of default values in issue#73.

We didn't include these fields in google/transit PR#388 simply because there are currently no consumers implementing these fields (as per the adoption tracker). We may think about including these fields if needed.

westontrillium commented 1 year ago

What if the OTP router estimates 20 minutes, but Google, with better transit data in hand, estimates it's 40 minutes at that time of day? Should the response I get back to a request for trip Zone 1-> Zone 3 at 9:30 depend on whether I'm using Google Directions or some OTP variant?

I don't see a way to avoid this without specifying requirements for every factor in how consumers estimate travel time, which seems a bit of an overreach to me. Can we really enforce what base map an app is using? What routing algorithm? If it should factor in traffic? Realtime traffic? Closed roads? Weather? Anyway, I already get different results with fixed route transit between the various apps trip planning apps I use.

In absence of concensus on this, in the least there ought to be a clarification that "reasonable attempt at estimates should be made to buffer the time traversals between pick ups and drop offs in order to fall within the indicated stop_times."

start/end_pickup_drop_off_window should account for the earliest time a rider can be picked up/dropped off and the exact cutoff time as publicized (although we've encountered services that don't even have exact timing. In those cases, agencies just give us something that's roughly representative of when they will provide transportation). I'll second what @tzujenchanmbd said regarding the offset/factor fields in the Flex repo, but even these won't guarantee against ambiguity–the majority of agencies I have talked to are either reluctant to provide such estimates or don't have a really good idea as to what numbers to give since one trip could greatly differ from the next, especially if someone is getting picked up in between. Additionally, most agencies running demand-responsive services likely do not have the necessary resources or time to provide us estimates based on actual historical data.

tsherlockcraig commented 1 year ago

Sorry @tzujenchanmbd i read on to @westontrillium 's comments and just skipped the last sentence of yours! You can ignore the comment below.

I am working on a OTP project that will be seeking to implement these soon.


Do stop_times.mean_duration_factor, mean_duration_offset, safe_duration_factor, safe_duration_offset in MobilityData/gtfs-flex PR#74 help with "buffer estimate" you mentioned?

I agree these are the values that clarify the boundary between what we're expecting consumer to know and what we're expecting producer to know, and that we're drawing the line in the right place.

However @tzujenchanmbd I don't actually see those values as part of the PR on the GTFS repo--do we need to add those there?

leonardehrenfried commented 1 year ago

Tzu-Chen mentioned in his comment that the reason it's not in the PR is that there are no consumers for it.

eliasmbd commented 7 months ago

@leonardehrenfried Considering the adoption of GTFS-Flex, this repo is now out of date. Would you like to move this issue to google/transit before we close the repo?

leonardehrenfried commented 7 months ago

The issue itself is resolved but I would love it if there was an archive of the discussion somewhere that I can refer to when questions come up.

eliasmbd commented 7 months ago

We won't be deleting the repo per say, it will be in a read-only state.

leonardehrenfried commented 7 months ago

Then I would say that I can stay exactly where it is.