Closed gabriel-korbato closed 1 year ago
My thoughts (conveyed during today's discussion):
First, it looks like you're proposing two new tables of schedule data. The question I don't see answered is, what do you need to be able to do with TIDES that you can't do now? For example, at Metro Transit we have a one-to-one mapping of patterns to shape_id
, so I can use shape_id
to analyze patterns from TIDES data. As you noted, others may have a many-to-one mapping, and would require a pattern specific identifier. Ok, great, so we need an ID for patterns, is that sufficient? Are these proposed tables required to meet the analysis needs? Can they be provided as GTFS extensions to those who want the additional detail?
Current work-around The current way to work around this is to join trips performed to the schedule and to look up the pattern from the scheduled trip. But there are issues with this:
- Not all operated trips are scheduled. How do we obtain the pattern of an unscheduled trip?
Under GTFS-ServiceChanges, there are options to either use a scheduled trip as a template, or define something entirely new. The first option is part of why I want a trip_id_scheduled
field anywhere there's a trip_id_performed
field. For entirely new trips, the pattern_id
could be provided along with the trip_id
by the CAD/AVL system.
- GTFS is being promoted as the preferred schedule representation, but it doesnât model patterns. (It was designed to display network and schedule information to passengers on a map, not to analyze and manage operations, so patterns are not required.) GTFS has the shape_id field in the trips table, but that represents a path on a map rather than a sequence of stops. Some agencies generate shape_id from their scheduling systemâs pattern, but this is not a requirement. The same shape can be used for different patterns (for example, local and limited-stop patterns along the same path). In this case patterns could be distilled by grouping trips serving the same stops in the same order.
- Shapes in GTFS donât have display names that would be useful for generating aggregated reports.
Though they could, there's nothing preventing shape_id from being a meaningful identifier like "Route:6;Pat:FRMN6UNV00".
- The current approach makes schedules a hard requirement of TIDES, even for analysis that doesnât involve schedules. For example, the schedule will always be required to analyze on-time performance, but it shouldnât be required to analyze running times. Agencies that donât have their schedule in GTFS should still be able to use TIDES.
I suppose that if you're not using GTFS you can use trips_performed.shape_id
to mean whatever you want, including pattern_id.
Possible Solutions
- Add pattern and pattern_stop tables to the TIDES specification. The tables would take the following structure:
pattern ( pattern_id TEXT NOT NULL pattern_name TEXT route_id TEXT --foreign key direction_id TEXT --foreign key shape_id TEXT --foreign key PRIMARY KEY (pattern_id) ) pattern_stop ( pattern_id TEXT NOT NULL --foreign key stop_sequence INTEGER NOT NULL stop_id TEXT NOT NULL --foreign key cumul_meters NUMERIC(10,2) --cumulative scheduled distance from first stop PRIMARY KEY (pattern_id, stop_sequence) )
These tables appear to be schedule data and they don't address the need to be able to aggregate on
pattern_id
, which could be met by simply addingpattern_id
to any table with atrip_id
,route_id
orshape_id
.The fare_transactions, passenger_events, vehicle_locations, and stop_visits tables would be updated by removing stop_sequence and adding:
Presumably you'd also want to add pattern_id
to these tables?
- seq_in_pattern, referencing the pattern_stop table. If GTFS is being used, this field would be a foreign key to GTFS stop_times.stop_sequence.
- seq_in_trip, for the sequence of a visit within its trip. This may differ from seq_in_pattern if scheduled stops are skipped or if unscheduled visits are made
I'm not sure I understand the motivation for replacing stop_sequence
with seq_in_pattern
and seq_in_trip
. This is confusing because seq_in_pattern
would mean the same thing as stop_sequence
, with or without patterns (a single trip can serve one and only one pattern). Additionally, seq_in_trip
is easily inferred from the order of observations for a trip! For unscheduled stops, the stop_sequence
would be null
, and the sequence within the trip would be just as easily inferred from the order of observations.
- Suggest adding pattern and pattern_stop tables to the GTFS specification, and make a GTFS feed a hard requirement of TIDES. In a sense this would be better because patterns pertain to the schedule, but it may be difficult to modify GTFS, and until they are added to GTFS, TIDES data would be difficult to work with.
Since you're asking to model schedule data I think this is the best option. I disagree that it makes GTFS a hard requirement. Simply adding an optional pattern_id
field would meet the vast majority of the needs for pattern-based analysis. Additional needs could be met by joining to the GTFS feed, the same way we can add "trip_headsign" by joining a scheduled trip_id to the GTFS feed, but it's not required to work with TIDES data.
Furthermore, there's nothing preventing you from extending GTFS trips.txt
with pattern_id
and pattern_name
, or for that matter, from adding them as extensions to TIDES. Both GTFS and TIDES allow extensions to the spec, and in the case of GTFS, it's basically a requirement to implement a change as an extension before it can be incorporated into the spec (changes required at least one producer and consumer).
Also, since these tables are schedule data, and would fit most naturally in GTFS, I'd really prefer that the GTFS community reviews them as a proposal. TIDES may not be the appropriate audience.
- Combine both options by suggesting adding the tables to GTFS, but also adding them to TIDES until they are accepted into GTFS. If the new tables are never accepted into GTFS, this option is equivalent to the first, with the difference being that we at least tried to add it to GTFS. In my opinion this is the best option.
I don't think modeling schedule data in TIDES is appropriate.
- Do nothing. This is an issue because it makes it difficult to produce aggregate reports by pattern. See the work-around above and accept its limitations. In my opinion this is the worst option.
pattern_id
. This would allow aggregation by patterns without modeling the schedule in TIDES. Additional pattern related information would be available through extensions to GTFS.Can you clarify your motivation for modeling patterns as it relates to processing TIDES data? A lot of this proposal seems to address deficiencies in GTFS, but wouldn't really be required for reporting operations data. For example, if you want to aggregate by pattern, you need only pattern_id
. Why wouldn't the needs of aggregation and reporting be met by adding an optional pattern_id
field to any table that currently includes trip_id_performed
?
@botanize thanks for your thoughtful input. You are right that to aggregate you only need to add pattern_id
to tables that have trips. I also agree that conceptually patterns should be defined with schedules. Without a pattern definition, however, we miss out on having a report-friendly label, or a clear and authoritative definition of the pattern's stops in order, unless you link to the schedule, which could be in GTFS or in some other format. That may be OK for many applications, but I still think having a standard pattern definition as part of TIDES would make it easier for tool developing TIDES consumers to prepare tools that work across agencies.
Per @e-lo's suggestion, I'd be OK with having patterns defined in the Transit Operational Data Standard (TODS), and having tools with these requirements require TIDES and at least the pattern tables from TODS.
Looks like we can meet your needs in TIDES by adding pattern_id
to some tables. You mentioned these tables for changes, are there others that should have an optional pattern_id
field, maybe trips_performed
?
I agree with the solution to add pattern_id to the tables as appropriate.
In the near term, in the absence of having pattern information in GTFS or TODS, it would be a reasonable extension of TIDES to define the patterns and pattern stops as originally proposed. But in the long run, it would be better to have this information come from the linked schedule information (whatever source that ends up being).
My agency's CAD/AVL and scheduling system works mostly with patterns, with trips being a byproduct, so this topic is important to get right imho.
Additionally, seq_in_trip is easily inferred from the order of observations for a trip For unscheduled stops, the stop_sequence would be null, and the sequence within the trip would be just as easily inferred from the order of observations.
Inferencing shouldn't be a requirement when an explicit data format is being defined. When skipped or unplanned stops occur on a trip, the pattern is still valid, but the trip observations differ, so the scheduled vs observed sequencing ordinals can diverge or re-converge later. seq_in_trip vs seq_in_pattern is useful and explicit.
Having pattern.stop_sequence is important although the suggestion to use foreign key to GTFS stop_times.stop_sequence, which is trip_id based, is problematic if trips_performed doesn't include pattern_id.
I would love to see GTFS trips.txt to include pattern_id but they haven't added it for their own reasons although it has been suggested many times.
I suggest not to mix shapes with patterns as they are logically different things. The waypoints in a shape file does not need to correspond to any stops but can be arbitrary points on a map, so parallel sequencing is not always an option.
trip_headsign is not a good replacement for pattern_id as using trip_headsign as a natural key creates issues where a strict ID avoids. Do trips of a same pattern have changing headsigns? For my agency, we include specialized pattern destination signs per pattern (or group of patterns). Pattern destinations are more specific than direction_id but more general than GTFS trip headsigns, such as zone, area or station. Could a destination_id be added as optional to the pattern structure?
Does cumul_meters include height or is it a 2D measurement? It is a little ambiguous.
A lot of this proposal seems to address deficiencies in GTFS, but wouldn't really be required for reporting operations data.
The service development here would take our operations data and use it to tweak the patterns for the upcoming schedule. When I say tweak, it is really a 9-month process, but patterns and statistics are much more important during that process.
We're going to close this issue by adding pattern_id
as an optional field to the appropriate tables.
I see two options, add pattern_id
to any table containing:
trip_id_performed
, or route_id
, currently just trips_performed
.Whereas route_id
can be found with trip_id
in GTFS, there is no pattern_id
in GTFS, so we probably want to add pattern_id
to each table containing trip_id_performed
.
Describe the problem The specification does not model patterns (defined below), but patterns are a key component of transit operations, they are necessary to refer to a stop by sequence within a trip, and they are very useful for aggregation and filtering when generating reports or performing analyses. If a user were to use TIDES with GTFS but without a pattern table, they would need to derive patterns on the fly by distilling them from gtfs.trips.
What is a pattern?
A pattern, also known as a variant or variation, is a sequence of stops, typically scheduled to be served by one or more vehicle trips. A route is simply a grouping of one or more patterns defined by the transit agency.
Many routes simply have two patterns, one in each direction, while other routes may have additional patterns such as:
Current work-around The current way to work around this is to join trips performed to the schedule and to look up the pattern from the scheduled trip. But there are issues with this:
Possible Solutions
The fare_transactions, passenger_events, vehicle_locations, and stop_visits tables would be updated by removing stop_sequence and adding:
Suggest adding pattern and pattern_stop tables to the GTFS specification, and make a GTFS feed a hard requirement of TIDES. In a sense this would be better because patterns pertain to the schedule, but it may be difficult to modify GTFS, and until they are added to GTFS, TIDES data would be difficult to work with.
Combine both options by suggesting adding the tables to GTFS, but also adding them to TIDES until they are accepted into GTFS. If the new tables are never accepted into GTFS, this option is equivalent to the first, with the difference being that we at least tried to add it to GTFS. In my opinion this is the best option.
Do nothing. This is an issue because it makes it difficult to produce aggregate reports by pattern. See the work-around above and accept its limitations. In my opinion this is the worst option.