TIDES-transit / TIDES

Transit ITS Data Exchange Specification for historical transit operations data
https://tides-transit.github.io/TIDES
Apache License 2.0
28 stars 4 forks source link

🐛📄 – Add pattern and pattern_stop tables #90

Closed gabriel-korbato closed 1 year ago

gabriel-korbato commented 2 years ago

Describe the problem The specification does not model patterns (defined below), but patterns are a key component of transit operations, they are necessary to refer to a stop by sequence within a trip, and they are very useful for aggregation and filtering when generating reports or performing analyses. If a user were to use TIDES with GTFS but without a pattern table, they would need to derive patterns on the fly by distilling them from gtfs.trips.

What is a pattern?

A pattern, also known as a variant or variation, is a sequence of stops, typically scheduled to be served by one or more vehicle trips. A route is simply a grouping of one or more patterns defined by the transit agency.

Many routes simply have two patterns, one in each direction, while other routes may have additional patterns such as:

Current work-around The current way to work around this is to join trips performed to the schedule and to look up the pattern from the scheduled trip. But there are issues with this:

  1. Not all operated trips are scheduled. How do we obtain the pattern of an unscheduled trip?
  2. GTFS is being promoted as the preferred schedule representation, but it doesn’t model patterns. (It was designed to display network and schedule information to passengers on a map, not to analyze and manage operations, so patterns are not required.) GTFS has the shape_id field in the trips table, but that represents a path on a map rather than a sequence of stops. Some agencies generate shape_id from their scheduling system’s pattern, but this is not a requirement. The same shape can be used for different patterns (for example, local and limited-stop patterns along the same path). In this case patterns could be distilled by grouping trips serving the same stops in the same order.
  3. Shapes in GTFS don’t have display names that would be useful for generating aggregated reports.
  4. The current approach makes schedules a hard requirement of TIDES, even for analysis that doesn’t involve schedules. For example, the schedule will always be required to analyze on-time performance, but it shouldn’t be required to analyze running times. Agencies that don’t have their schedule in GTFS should still be able to use TIDES.

Possible Solutions

  1. Add pattern and pattern_stop tables to the TIDES specification. The tables would take the following structure:
pattern (
    pattern_id      TEXT     NOT NULL
    pattern_name    TEXT
    route_id        TEXT     --foreign key
    direction_id    TEXT     --foreign key
    shape_id        TEXT     --foreign key
    PRIMARY KEY (pattern_id)
)

pattern_stop (
    pattern_id       TEXT        NOT NULL    --foreign key
    stop_sequence    INTEGER     NOT NULL
    stop_id          TEXT        NOT NULL    --foreign key
    cumul_meters     NUMERIC(10,2) --cumulative scheduled distance from first stop
    PRIMARY KEY (pattern_id, stop_sequence)
)

The fare_transactions, passenger_events, vehicle_locations, and stop_visits tables would be updated by removing stop_sequence and adding:

  1. Suggest adding pattern and pattern_stop tables to the GTFS specification, and make a GTFS feed a hard requirement of TIDES. In a sense this would be better because patterns pertain to the schedule, but it may be difficult to modify GTFS, and until they are added to GTFS, TIDES data would be difficult to work with.

  2. Combine both options by suggesting adding the tables to GTFS, but also adding them to TIDES until they are accepted into GTFS. If the new tables are never accepted into GTFS, this option is equivalent to the first, with the difference being that we at least tried to add it to GTFS. In my opinion this is the best option.

  3. Do nothing. This is an issue because it makes it difficult to produce aggregate reports by pattern. See the work-around above and accept its limitations. In my opinion this is the worst option.

e-lo commented 2 years ago

My thoughts (conveyed during today's discussion):

botanize commented 2 years ago

First, it looks like you're proposing two new tables of schedule data. The question I don't see answered is, what do you need to be able to do with TIDES that you can't do now? For example, at Metro Transit we have a one-to-one mapping of patterns to shape_id, so I can use shape_id to analyze patterns from TIDES data. As you noted, others may have a many-to-one mapping, and would require a pattern specific identifier. Ok, great, so we need an ID for patterns, is that sufficient? Are these proposed tables required to meet the analysis needs? Can they be provided as GTFS extensions to those who want the additional detail?

Current work-around The current way to work around this is to join trips performed to the schedule and to look up the pattern from the scheduled trip. But there are issues with this:

  1. Not all operated trips are scheduled. How do we obtain the pattern of an unscheduled trip?

Under GTFS-ServiceChanges, there are options to either use a scheduled trip as a template, or define something entirely new. The first option is part of why I want a trip_id_scheduled field anywhere there's a trip_id_performed field. For entirely new trips, the pattern_id could be provided along with the trip_id by the CAD/AVL system.

  1. GTFS is being promoted as the preferred schedule representation, but it doesn’t model patterns. (It was designed to display network and schedule information to passengers on a map, not to analyze and manage operations, so patterns are not required.) GTFS has the shape_id field in the trips table, but that represents a path on a map rather than a sequence of stops. Some agencies generate shape_id from their scheduling system’s pattern, but this is not a requirement. The same shape can be used for different patterns (for example, local and limited-stop patterns along the same path). In this case patterns could be distilled by grouping trips serving the same stops in the same order.
  2. Shapes in GTFS don’t have display names that would be useful for generating aggregated reports.

Though they could, there's nothing preventing shape_id from being a meaningful identifier like "Route:6;Pat:FRMN6UNV00".

  1. The current approach makes schedules a hard requirement of TIDES, even for analysis that doesn’t involve schedules. For example, the schedule will always be required to analyze on-time performance, but it shouldn’t be required to analyze running times. Agencies that don’t have their schedule in GTFS should still be able to use TIDES.

I suppose that if you're not using GTFS you can use trips_performed.shape_id to mean whatever you want, including pattern_id.

Possible Solutions

  1. Add pattern and pattern_stop tables to the TIDES specification. The tables would take the following structure:
pattern (
    pattern_id      TEXT     NOT NULL
    pattern_name    TEXT
    route_id        TEXT     --foreign key
    direction_id    TEXT     --foreign key
    shape_id        TEXT     --foreign key
    PRIMARY KEY (pattern_id)
)

pattern_stop (
    pattern_id       TEXT        NOT NULL    --foreign key
    stop_sequence    INTEGER     NOT NULL
    stop_id          TEXT        NOT NULL    --foreign key
    cumul_meters     NUMERIC(10,2) --cumulative scheduled distance from first stop
    PRIMARY KEY (pattern_id, stop_sequence)
)

These tables appear to be schedule data and they don't address the need to be able to aggregate on pattern_id, which could be met by simply adding pattern_id to any table with a trip_id, route_id or shape_id.

The fare_transactions, passenger_events, vehicle_locations, and stop_visits tables would be updated by removing stop_sequence and adding:

Presumably you'd also want to add pattern_id to these tables?

  • seq_in_pattern, referencing the pattern_stop table. If GTFS is being used, this field would be a foreign key to GTFS stop_times.stop_sequence.
  • seq_in_trip, for the sequence of a visit within its trip. This may differ from seq_in_pattern if scheduled stops are skipped or if unscheduled visits are made

I'm not sure I understand the motivation for replacing stop_sequence with seq_in_pattern and seq_in_trip. This is confusing because seq_in_pattern would mean the same thing as stop_sequence, with or without patterns (a single trip can serve one and only one pattern). Additionally, seq_in_trip is easily inferred from the order of observations for a trip! For unscheduled stops, the stop_sequence would be null, and the sequence within the trip would be just as easily inferred from the order of observations.

  1. Suggest adding pattern and pattern_stop tables to the GTFS specification, and make a GTFS feed a hard requirement of TIDES. In a sense this would be better because patterns pertain to the schedule, but it may be difficult to modify GTFS, and until they are added to GTFS, TIDES data would be difficult to work with.

Since you're asking to model schedule data I think this is the best option. I disagree that it makes GTFS a hard requirement. Simply adding an optional pattern_id field would meet the vast majority of the needs for pattern-based analysis. Additional needs could be met by joining to the GTFS feed, the same way we can add "trip_headsign" by joining a scheduled trip_id to the GTFS feed, but it's not required to work with TIDES data.

Furthermore, there's nothing preventing you from extending GTFS trips.txt with pattern_id and pattern_name, or for that matter, from adding them as extensions to TIDES. Both GTFS and TIDES allow extensions to the spec, and in the case of GTFS, it's basically a requirement to implement a change as an extension before it can be incorporated into the spec (changes required at least one producer and consumer).

Also, since these tables are schedule data, and would fit most naturally in GTFS, I'd really prefer that the GTFS community reviews them as a proposal. TIDES may not be the appropriate audience.

  1. Combine both options by suggesting adding the tables to GTFS, but also adding them to TIDES until they are accepted into GTFS. If the new tables are never accepted into GTFS, this option is equivalent to the first, with the difference being that we at least tried to add it to GTFS. In my opinion this is the best option.

I don't think modeling schedule data in TIDES is appropriate.

  1. Do nothing. This is an issue because it makes it difficult to produce aggregate reports by pattern. See the work-around above and accept its limitations. In my opinion this is the worst option.
  1. Add pattern_id. This would allow aggregation by patterns without modeling the schedule in TIDES. Additional pattern related information would be available through extensions to GTFS.

Can you clarify your motivation for modeling patterns as it relates to processing TIDES data? A lot of this proposal seems to address deficiencies in GTFS, but wouldn't really be required for reporting operations data. For example, if you want to aggregate by pattern, you need only pattern_id. Why wouldn't the needs of aggregation and reporting be met by adding an optional pattern_id field to any table that currently includes trip_id_performed?

gabriel-korbato commented 1 year ago

@botanize thanks for your thoughtful input. You are right that to aggregate you only need to add pattern_id to tables that have trips. I also agree that conceptually patterns should be defined with schedules. Without a pattern definition, however, we miss out on having a report-friendly label, or a clear and authoritative definition of the pattern's stops in order, unless you link to the schedule, which could be in GTFS or in some other format. That may be OK for many applications, but I still think having a standard pattern definition as part of TIDES would make it easier for tool developing TIDES consumers to prepare tools that work across agencies.

Per @e-lo's suggestion, I'd be OK with having patterns defined in the Transit Operational Data Standard (TODS), and having tools with these requirements require TIDES and at least the pattern tables from TODS.

botanize commented 1 year ago

Looks like we can meet your needs in TIDES by adding pattern_id to some tables. You mentioned these tables for changes, are there others that should have an optional pattern_id field, maybe trips_performed?

jlstpaul commented 1 year ago

I agree with the solution to add pattern_id to the tables as appropriate.

In the near term, in the absence of having pattern information in GTFS or TODS, it would be a reasonable extension of TIDES to define the patterns and pattern stops as originally proposed. But in the long run, it would be better to have this information come from the linked schedule information (whatever source that ends up being).

mpaine-act commented 1 year ago

My agency's CAD/AVL and scheduling system works mostly with patterns, with trips being a byproduct, so this topic is important to get right imho.

Additionally, seq_in_trip is easily inferred from the order of observations for a trip For unscheduled stops, the stop_sequence would be null, and the sequence within the trip would be just as easily inferred from the order of observations.

Inferencing shouldn't be a requirement when an explicit data format is being defined. When skipped or unplanned stops occur on a trip, the pattern is still valid, but the trip observations differ, so the scheduled vs observed sequencing ordinals can diverge or re-converge later. seq_in_trip vs seq_in_pattern is useful and explicit.

Having pattern.stop_sequence is important although the suggestion to use foreign key to GTFS stop_times.stop_sequence, which is trip_id based, is problematic if trips_performed doesn't include pattern_id.

I would love to see GTFS trips.txt to include pattern_id but they haven't added it for their own reasons although it has been suggested many times.

I suggest not to mix shapes with patterns as they are logically different things. The waypoints in a shape file does not need to correspond to any stops but can be arbitrary points on a map, so parallel sequencing is not always an option.

trip_headsign is not a good replacement for pattern_id as using trip_headsign as a natural key creates issues where a strict ID avoids. Do trips of a same pattern have changing headsigns? For my agency, we include specialized pattern destination signs per pattern (or group of patterns). Pattern destinations are more specific than direction_id but more general than GTFS trip headsigns, such as zone, area or station. Could a destination_id be added as optional to the pattern structure?

Does cumul_meters include height or is it a 2D measurement? It is a little ambiguous.

A lot of this proposal seems to address deficiencies in GTFS, but wouldn't really be required for reporting operations data.

The service development here would take our operations data and use it to tweak the patterns for the upcoming schedule. When I say tweak, it is really a 9-month process, but patterns and statistics are much more important during that process.

botanize commented 1 year ago

We're going to close this issue by adding pattern_id as an optional field to the appropriate tables.

I see two options, add pattern_id to any table containing:

  1. trip_id_performed, or
  2. route_id, currently just trips_performed.

Whereas route_id can be found with trip_id in GTFS, there is no pattern_id in GTFS, so we probably want to add pattern_id to each table containing trip_id_performed.