Context

The ODS data model is based on the concept of supplementing public GTFS data with internal operational data, capable of together modeling the entire network.

This, in theory, works well: schedule information for public trips is released publicly to customers, and ODS contains all of the non-public information.

Problem

This begins to break down when there becomes "supplemental" information regarding public trips that some operators release publicly, but others do not. For example:

Blocks: By relying on blocks in trips.txt for ODS's block information, this would require operators to release block information publicly, which is untenable for most rail operations. A big factor influencing this is the frequency of cycle swaps in terminals and the decoupling of crews and equipment, meaning consumers using this information would be likely to deduce erroneous implications (such as delay propagation through blocks, which is not accurate).
Internal Stops: Many rail operators have trains make stops at certain stations for employees only, be it at a train yard or regular passenger station. Some of these are advertised publicly and announced as an employee-only stop (e.g. "Hillside Support Facility for Employees" on the LIRR), and others aren't advertised whatsoever (e.g. our trains at several different train yards like the Boston Engine Terminal, NJ Transit trains at their Meadowlands Maintenance Complex, SEPTA trains at several train yards, etc.). While they are included in our internal schedule(s), we and most other rail operators do not wish to include them publicly… yet they can't be entered in deadheads.txt as a deadhead trip since this is an internal employee stop on an existing public trip.
Waypoints: Beyond the above, many rail carriers operate trips with times included at intermediate waypoints (e.g. interlockings/switches), but that are not released publicly for security/sensitivity reasons. The same is also seen on light rail, rapid transit, and even some bus systems, albeit less frequently.

I recognize that some operations already simply release this information publicly in the blocks field and using pickup_type,drop_off_type values of 1,1 — albeit with no distinction as to whether that's a stop or merely a passing time — but this public release is simply not viable for us, nor many other operations… particularly on the rail side.

Likewise, there may be scenarios where public information differs from internal information, such as:

Internal Routes: While rail trips on certain corridors/segments may be cobranded to the public as a single line (e.g. our "Franklin/Foxboro Line"), the line may be split into multiple subsets or even to other existing routes for internal purposes (e.g. "Franklin via NEC," "Franklin via DB," "Foxboro via NEC," "Foxboro via DB," "Football Extra")
Internal Times: Many railroads employ the practice of using internal times that are distinct from public times, allowing slight buffers in passenger time operations relative to the precision of often complex fixed railroad infrastructure. The most prominent and widespread example of this is in New York, where all carriers (Amtrak, Metro-North, NJT, and the LIRR) have internal departure times at least one minute after the publicly-advertised time.
Track Assignments: Many railroads with large terminals have a general framework of track assignments in their terminals (NY Penn Station, Washington Union Station, Boston North/South Station, etc.), yet the variability and complexity of operations means that half of trains, at best, won't use a planned terminal track. This means public track assignments are not released in GTFS at these stations, although there is a use case for them to be available in other internal systems.

Clearly, if we're looking for ODS to be adopted more widely and within the rail operating space, it needs to accommodate these requirements.

Proposed Solution

In thinking of a proposed solution, I wanted something that would be extendable and could future-proof us for subsequent use cases that we might not yet have conceptualized. After bouncing around a few concepts in my head, I believe I've settled on a new standardized _supplement.txt suffix, capable of:

Adding rows to the corresponding public GTFS file.
Adding/replacing values in the corresponding public GTFS file where a row with the file's applicable primary key / unique identifier already exists.

Each file would use the same base name and fields in accordance with the GTFS standard.[^1]

This would cover all of the above use cases, and then some. For example:

Blocks, where not defined, could be added to trips_supplement.txt as follows:
```
trip_id,block_id
trip1,100
trip2,100
trip3,100
trip4,200
trip5,200
trip6,200
```
This would add or replace the block_id value on each trip's entry in trips.txt to match the above.
Track assignments could be assigned to stop_times_supplement.txt as follows:
```
trip_id,stop_sequence,stop_id
trip80,10,TerminalStop5
trip90,10,TerminalStop8
```
This would replace the stop_id on the applicable trip's stop_times entries with the stop_id listed above.
Internal Stops and Waypoints could be added to stop_times_supplement.txt, as follows:
```
trip_id,stop_id,stop_sequence,arrival_time,departure_time,pickup_type,drop_off_type
trip3,DoubleTrackStart,18,13:40:00,13:40:00,1,1
trip3,DoubleTrackEnd,26,14:00:00,14:00:00,1,1
trip3,BigTrainYard31,14:05:00,14:05:00,3,3
```
This would add the applicable stops and data to each trip's stop_times entries where applicable and allow them to be properly interspersed on the existing public trips.

_N.B. This would require operators to "leave space" for internal locations in their stop_sequence values (which need only be increasing integers, but not sequential), although most operations using a scheduling system already do this in practice._

Internal Routes could have their identifiers added to routes_supplement.txt, with route data on the applicable trips updated as follows:

route_id,route_long_name,route_type
1W,West Branch of Route 1,2
1W,East Branch of Route 1,2
EmployeeShuttle,Employee Shuttle Bus,3

trip_id,route_id
123,1W
456,1E
789,EmployeeShuttle
aaa,1W
bbb,1E

Implementation Approaches for `stop_times` at Public AND Internal Locations

The above leads to a need to add entries to stop_times.txt at both existing public and internal locations, which itself leads to three/four interesting hypothetical options:

Define that all supplement entries need to precisely mirror their GTFS counterparts, and thus the places must be defined in stops_supplement.txt.

Add: stops_supplement.txt:

stop_id,stop_name,stop_desc,stop_lat,stop_lon
DoubleTrackStart,Start of Double Track,Milepost 5,40.000,-75.000
DoubleTrackEnd,End of Double Track,Milepost 20,40.100,-75.100
BigTrainYard,Big Train Yard,,40.105,-75.105

Define that stop_times_supplement.txt supports the definition of an ops_location_id in such an eponymous column with stop_id omitted, a break from the stop_times.txt standard.

Modify the earlier stops_supplement.txt example to:

trip_id,stop_id,ops_location_id,stop_sequence,arrival_time,departure_time,pickup_type,drop_off_type
trip3,,DoubleTrackStart,18,13:40:00,13:40:00,1,1
trip3,,DoubleTrackEnd,18,14:00:00,14:00:00,1,1
trip3,,BigTrainYard,18,14:05:00,14:05:00,3,3

Merge the ops_locations.txt file into stops_supplement.txt and just treat all ops_locations as added stops.
- This mirrors the example from approach 1 above.
- The biggest implication from this is it means ops_locations and stops must have mutually-exclusive IDs, although I don't think that's a terrible thing in the grand scheme of things.
Take approach 3 a step further and merge all of the analogous files (deadheads.txt, ops_locations.txt, deadhead_times.txt) into their _supplement counterparts (i.e. trips_supplement.txt, stops_supplement.txt, stop_times_supplement.txt).
- This has the benefit of reducing the additional files and structures being added in ODS for the portions where we're simply adding internal equivalents, and allows us to piggyback on the existing standard.
- One potential risk of this approach is it leaves us susceptible to a potential breaking change in the future should we ever implement an ODS-standard extension (e.g. adding an ops_location_field to stops_supplement.txt) that conflicts with a future GTFS change, but I think I'd generally advise against any such additional fields in general.
- This approach would allow us to eliminate all of the definitions and conditional requirement fields that add to the complexity of some of the supplemental fields by placing all trips, stops, and stop_times in a single merged datasource.
- Validation between internal and external fields becomes easy in that one can easily verify that all IDs listed are unique.
- The learning curve for individuals looking to implement ODS becomes lower, as it's simply modeling the internal trips in a supplemental file vs calling otherwise analogous items by distinct names depending on the context.

I realize approach 4 would be a relatively major/breaking change to the standard, which I'd normally reject, but it might be worth considering since (a) the standard is not yet widely adopted, and (b) those who have implemented the standard would only need to change file/column labels vs adding new logic (some of which — e.g. the comparative field ID values — could be eliminated entirely, thereby further reducing complexity and reducing the barrier for implementation).

Curious for everyone's thoughts/input on this, as I not only see this being useful in the context of modeling runs (and required to do so for our operations), but also see it being valuable for helping grow support for the standard as many operations that may not yet be ready to implement full run-modeling in ODS could have a use case for modeling deadheads and trips with internal locations (e.g. Rail AVL systems where we don't care who's working the trip, we just care about the trip, its waypoint times, its cycle, etc. but have been unable to use GTFS given the need to combine the two elements). That could further solidify the standard's role in the industry and help grow support for widespread adoption.

[^1]: We could also consider an optional _NEW field suffix for changing a field's PK value, but I am disinclined to do so as (1) I don't foresee there being a compelling need, and (2) there is a major risk of downstream propagation issues by implementing a PK change.

So to summarize:

Instead of making new files to describe internal-only trips and stops, publish a diff of rows+columns to add/change to the existing GTFS files.

I really like this! It's such an elegant solution. It smooths out a bunch of awkward parts of ODS files, and makes it really obvious how to do any future extensions (official or custom) for new fields.

I lean towards option 4. If we do it for any file, we might as well lean into it.

I'd want to go through all the old discussions to make sure there's no use cases that this makes impossible to represent, but off the top of my head I don't see anything it would break.

Q1: Will there be any specific new column names that ODS needs to standardize, beyond columns already listed in GTFS? I'm thinking:

run_id for trips_supplement.txt (optional, for trips that only have 1 run, might be useful for bus)
Some new column to mark trips, stops, or stop_times as non-revenue. I guess it's implied by them being in ODS and not GTFS, but when working with the merged data, it might be nice to have it as a field in the merged data.
- This could also make producing easier. If an agency only adds rows+columns and doesn't edit any, then an easy way to produce would be first create a merged version, and then filter out the nonrevenue rows+columns to get a public GTFS file.
- ~~Alternatively, maybe this column isn't included in the _supplement.txt file, but is instead automatically added as part of the consumer's merge process for rows that are added (rather than edited).~~
In stop_times_supplement.txt an indicator of whether it's a timepoint without a stop, or whether the vehicle stops and some people can get on/off but not the general public.

Q2: If this is meant to modify an existing GTFS feed, but it's published separately, then it matters that you're applying it to the correct GTFS file. Maybe ODS needs a metadata file that references GTFS's feed_info.feed_version.

Q3: Would this allow supplementing any file and column, or only a small allowlist of ODS-approved files and columns? Like, if an agency wants to write internal pathways or something unexpected like that, is that allowed by this spec? Are consumers expected to handle it?

Thanks, Sky! To your questions:

[Q1] Yes, this is certainly an option we could pursue, albeit with the caution/risk that an equivalently-named field is added to GTFS that would interfere with this use case.

Just to name one example, I could see adding a stops_supplement.txt boolean value of deadhead, or added enum values to the existing location_type.
I don't think we can safely assume anything in ODS equates to a deadhead, since some may be internal modifications of existing public trips, so specifying any nonrevenue values should be explicit (and likewise eases the producer realm by making it easier to export/filter).
To the use cases in stop_times_supplement.txt, a timepoint without a stop could be modeled by (1,1) as is the existing public norm, and employee stops could likewise be modeled by an added enum extension to pickup_type or drop_off_type.

Overall, to this question, I think it would be worth maintaining a list of standardized supplemental fields and extensions in ODS, which would also help bolster the future case for avoiding any collisions with subsequent GTFS additions.

[Q2] This is a valid point, although I could see cases where the two are decoupled and an ODS value is valid on the same public IDs across versions, or where the reverse is true. My thought is that valid approaches would be:

Including all of the GTFS files alongside the supplements in an ODS zip file.
Having the latest ODS file implicitly match the latest GTFS file, such that any two downloaded simultaneously match one another.
Having an ods_info.txt file akin to your proposal that contains such a mapping to feed_info.feed_version in static GTFS.

[Q3] I don't see any reason why the standard couldn't support additional fields/files akin to how GTFS currently treats such supplemental fields/files in the base files. However, beyond a requirement to at worst ignore the extraneous data and proceed, I think the obligation for a consumer to support these extensions would depend on the context and use cases of the given consumer's application/tool.

So far we've discussed adding and changing rows. Is there a way to remove rows?

A couple examples of situations where we might want to remove data:

- We have a couple through-routed trips, where a bus does one route and then continues through to another route. We show these to riders as two separate trips on their respective routes, but operationally they're more like a single trip. We'd want 2 trips in the public GTFS file, and 1 in the ODS data after merging. - Sometimes during planned disruptions (construction) we make up trips for GTFS which approximately reflect the service we'll run, but these trips are a just a useful fiction for the public. Internally, we have a different representation of service, so would want to remove those trips in the internal merged feed.

One potential way to do that: All _supplement files could have an optional delete column. If that column is set to 1, then instead of merging the data, that row is deleted. If it's 0 or blank, the data is added/changed. The column wouldn't appear in the merged data.

I'm a bit torn on this functionality. Provided there's not a requirement for all trips to be assigned to operators, I see a case where these trips could simply (a) go unassigned and be ignored by a consumer, or (b) removed by splitting the applicable trips into a separate service_id as needed in trips_supplement.txt, and removed from a given day by a calendar_dates_supplement.txt type 2 exception entry. Yes, consumers would need to check another _supplement file, but the mechanics of doing so should be fairly generalizable.

If we were to include a "delete" command, I like the mechanics of adding an optional delete column that removes a matching row when set to 1, but does nothing in any other instance. This would obviously preclude GTFS producers from including a synonymous delete column in their public GTFS (perhaps an argument to use ODS_delete), but I don't foresee a future conflict.

I think out of an abundance of caution it would be smart for us to put "ods_" as a prefix for any column we add to the supplements. This will also make it obvious on visual inspection what is not coming from the original spec.

To clarify, are you still proposing to add routes_supplement.txt in addition to supplements for trips, stops, and stop_times? If yes, does that cover everything that we would need supplemental files for? @jeffkessler-keolis

I'm going to say something that may be a bit antithetical to the consumers, but I don't think we need to be prescriptive as to the _supplement files supported. Theoretically, there's no reason why any GTFS file could not be modified in this fashion, be it with additions or overriding by the filename's eponymous _id field.

The same even holds true for experimental files, such as the MBTA's multi_route_trips.txt (which indicates trips that should be displayed on timetables beyond its specific route); there's no reason why the same structure couldn't be applied to modify the public version of this file for internal consumption.

Obviously there are risks/concerns to this from a consumer side of knowing what files need to be implemented, but I think one could say that any file on which one needs to rely in the GTFS data for an ODS purpose could be modified via the _supplement standard. i.e. If you need it for GTFS, you should be prepared to have it modified via ODS.

Realistically, I believe trips, stops, stop_times, and routes are the primary files that one would reasonably expect to change via ODS, but to the extent a consumer may wish to rely on another GTFS file for their instance / application / use case, they should expect that the file can be added via new rows or have fields overridden via a row with a matching primary key in an applicable _supplement file.

cal-itp / operational-data-standard

Supplementing Public GTFS data (trips, stops, stop_times, routes, etc.) within ODS #55

Context

Problem

Proposed Solution

Implementation Approaches for `stop_times` at Public AND Internal Locations

cal-itp / operational-data-standard

Supplementing Public GTFS data (trips, stops, stop_times, routes, etc.) within ODS #55

Context

Problem

Proposed Solution

Implementation Approaches for stop_times at Public AND Internal Locations

Implementation Approaches for `stop_times` at Public AND Internal Locations