Closed jeffkessler-keolis closed 2 months ago
So to summarize:
Instead of making new files to describe internal-only trips and stops, publish a diff of rows+columns to add/change to the existing GTFS files.
I really like this! It's such an elegant solution. It smooths out a bunch of awkward parts of ODS files, and makes it really obvious how to do any future extensions (official or custom) for new fields.
I lean towards option 4. If we do it for any file, we might as well lean into it.
I'd want to go through all the old discussions to make sure there's no use cases that this makes impossible to represent, but off the top of my head I don't see anything it would break.
Q1: Will there be any specific new column names that ODS needs to standardize, beyond columns already listed in GTFS? I'm thinking:
run_id
for trips_supplement.txt
(optional, for trips that only have 1 run, might be useful for bus)_supplement.txt
file, but is instead automatically added as part of the consumer's merge process for rows that are added (rather than edited).stop_times_supplement.txt
an indicator of whether it's a timepoint without a stop, or whether the vehicle stops and some people can get on/off but not the general public.Q2: If this is meant to modify an existing GTFS feed, but it's published separately, then it matters that you're applying it to the correct GTFS file. Maybe ODS needs a metadata file that references GTFS's feed_info.feed_version
.
Q3: Would this allow supplementing any file and column, or only a small allowlist of ODS-approved files and columns? Like, if an agency wants to write internal pathways
or something unexpected like that, is that allowed by this spec? Are consumers expected to handle it?
Thanks, Sky! To your questions:
[Q1] Yes, this is certainly an option we could pursue, albeit with the caution/risk that an equivalently-named field is added to GTFS that would interfere with this use case.
Just to name one example, I could see adding a stops_supplement.txt
boolean value of deadhead
, or added enum values to the existing location_type
.
I don't think we can safely assume anything in ODS equates to a deadhead, since some may be internal modifications of existing public trips, so specifying any nonrevenue values should be explicit (and likewise eases the producer realm by making it easier to export/filter).
To the use cases in stop_times_supplement.txt
, a timepoint without a stop could be modeled by (1,1) as is the existing public norm, and employee stops could likewise be modeled by an added enum extension to pickup_type
or drop_off_type
.
Overall, to this question, I think it would be worth maintaining a list of standardized supplemental fields and extensions in ODS, which would also help bolster the future case for avoiding any collisions with subsequent GTFS additions.
[Q2] This is a valid point, although I could see cases where the two are decoupled and an ODS value is valid on the same public IDs across versions, or where the reverse is true. My thought is that valid approaches would be:
ods_info.txt
file akin to your proposal that contains such a mapping to feed_info.feed_version
in static GTFS.[Q3] I don't see any reason why the standard couldn't support additional fields/files akin to how GTFS currently treats such supplemental fields/files in the base files. However, beyond a requirement to at worst ignore the extraneous data and proceed, I think the obligation for a consumer to support these extensions would depend on the context and use cases of the given consumer's application/tool.
So far we've discussed adding and changing rows. Is there a way to remove rows?
One potential way to do that: All _supplement
files could have an optional delete
column. If that column is set to 1
, then instead of merging the data, that row is deleted. If it's 0
or blank, the data is added/changed. The column wouldn't appear in the merged data.
I'm a bit torn on this functionality. Provided there's not a requirement for all trips to be assigned to operators, I see a case where these trips could simply (a) go unassigned and be ignored by a consumer, or (b) removed by splitting the applicable trips into a separate service_id
as needed in trips_supplement.txt
, and removed from a given day by a calendar_dates_supplement.txt
type 2
exception entry. Yes, consumers would need to check another _supplement
file, but the mechanics of doing so should be fairly generalizable.
If we were to include a "delete" command, I like the mechanics of adding an optional delete
column that removes a matching row when set to 1
, but does nothing in any other instance. This would obviously preclude GTFS producers from including a synonymous delete
column in their public GTFS (perhaps an argument to use ODS_delete
), but I don't foresee a future conflict.
I think out of an abundance of caution it would be smart for us to put "ods_" as a prefix for any column we add to the supplements. This will also make it obvious on visual inspection what is not coming from the original spec.
To clarify, are you still proposing to add routes_supplement.txt in addition to supplements for trips, stops, and stop_times? If yes, does that cover everything that we would need supplemental files for? @jeffkessler-keolis
I'm going to say something that may be a bit antithetical to the consumers, but I don't think we need to be prescriptive as to the _supplement
files supported. Theoretically, there's no reason why any GTFS file could not be modified in this fashion, be it with additions or overriding by the filename's eponymous _id
field.
The same even holds true for experimental files, such as the MBTA's multi_route_trips.txt
(which indicates trips that should be displayed on timetables beyond its specific route); there's no reason why the same structure couldn't be applied to modify the public version of this file for internal consumption.
Obviously there are risks/concerns to this from a consumer side of knowing what files need to be implemented, but I think one could say that any file on which one needs to rely in the GTFS data for an ODS purpose could be modified via the _supplement
standard. i.e. If you need it for GTFS, you should be prepared to have it modified via ODS.
Realistically, I believe trips, stops, stop_times, and routes are the primary files that one would reasonably expect to change via ODS, but to the extent a consumer may wish to rely on another GTFS file for their instance / application / use case, they should expect that the file can be added via new rows or have fields overridden via a row with a matching primary key in an applicable _supplement
file.
Context
The ODS data model is based on the concept of supplementing public GTFS data with internal operational data, capable of together modeling the entire network.
This, in theory, works well: schedule information for public trips is released publicly to customers, and ODS contains all of the non-public information.
Problem
This begins to break down when there becomes "supplemental" information regarding public trips that some operators release publicly, but others do not. For example:
Blocks: By relying on blocks in trips.txt for ODS's block information, this would require operators to release block information publicly, which is untenable for most rail operations. A big factor influencing this is the frequency of cycle swaps in terminals and the decoupling of crews and equipment, meaning consumers using this information would be likely to deduce erroneous implications (such as delay propagation through blocks, which is not accurate).
Internal Stops: Many rail operators have trains make stops at certain stations for employees only, be it at a train yard or regular passenger station. Some of these are advertised publicly and announced as an employee-only stop (e.g. "Hillside Support Facility for Employees" on the LIRR), and others aren't advertised whatsoever (e.g. our trains at several different train yards like the Boston Engine Terminal, NJ Transit trains at their Meadowlands Maintenance Complex, SEPTA trains at several train yards, etc.). While they are included in our internal schedule(s), we and most other rail operators do not wish to include them publicly… yet they can't be entered in deadheads.txt as a deadhead trip since this is an internal employee stop on an existing public trip.
Waypoints: Beyond the above, many rail carriers operate trips with times included at intermediate waypoints (e.g. interlockings/switches), but that are not released publicly for security/sensitivity reasons. The same is also seen on light rail, rapid transit, and even some bus systems, albeit less frequently.
Likewise, there may be scenarios where public information differs from internal information, such as:
Internal Routes: While rail trips on certain corridors/segments may be cobranded to the public as a single line (e.g. our "Franklin/Foxboro Line"), the line may be split into multiple subsets or even to other existing routes for internal purposes (e.g. "Franklin via NEC," "Franklin via DB," "Foxboro via NEC," "Foxboro via DB," "Football Extra")
Internal Times: Many railroads employ the practice of using internal times that are distinct from public times, allowing slight buffers in passenger time operations relative to the precision of often complex fixed railroad infrastructure. The most prominent and widespread example of this is in New York, where all carriers (Amtrak, Metro-North, NJT, and the LIRR) have internal departure times at least one minute after the publicly-advertised time.
Track Assignments: Many railroads with large terminals have a general framework of track assignments in their terminals (NY Penn Station, Washington Union Station, Boston North/South Station, etc.), yet the variability and complexity of operations means that half of trains, at best, won't use a planned terminal track. This means public track assignments are not released in GTFS at these stations, although there is a use case for them to be available in other internal systems.
Clearly, if we're looking for ODS to be adopted more widely and within the rail operating space, it needs to accommodate these requirements.
Proposed Solution
In thinking of a proposed solution, I wanted something that would be extendable and could future-proof us for subsequent use cases that we might not yet have conceptualized. After bouncing around a few concepts in my head, I believe I've settled on a new standardized
_supplement.txt
suffix, capable of:Each file would use the same base name and fields in accordance with the GTFS standard.[^1]
This would cover all of the above use cases, and then some. For example:
Blocks, where not defined, could be added to
trips_supplement.txt
as follows:Track assignments could be assigned to
stop_times_supplement.txt
as follows:Internal Stops and Waypoints could be added to
stop_times_supplement.txt
, as follows:Internal Routes could have their identifiers added to
routes_supplement.txt
, with route data on the applicable trips updated as follows:Implementation Approaches for
stop_times
at Public AND Internal LocationsThe above leads to a need to add entries to
stop_times.txt
at both existing public and internal locations, which itself leads to three/four interesting hypothetical options:Define that all supplement entries need to precisely mirror their GTFS counterparts, and thus the places must be defined in
stops_supplement.txt
.Add:
stops_supplement.txt
:Define that
stop_times_supplement.txt
supports the definition of anops_location_id
in such an eponymous column withstop_id
omitted, a break from thestop_times.txt
standard.Modify the earlier
stops_supplement.txt
example to:Merge the
ops_locations.txt
file intostops_supplement.txt
and just treat all ops_locations as added stops.ops_locations
andstops
must have mutually-exclusive IDs, although I don't think that's a terrible thing in the grand scheme of things.Take approach 3 a step further and merge all of the analogous files (
deadheads.txt
,ops_locations.txt
,deadhead_times.txt
) into their_supplement
counterparts (i.e.trips_supplement.txt
,stops_supplement.txt
,stop_times_supplement.txt
).This has the benefit of reducing the additional files and structures being added in ODS for the portions where we're simply adding internal equivalents, and allows us to piggyback on the existing standard.
One potential risk of this approach is it leaves us susceptible to a potential breaking change in the future should we ever implement an ODS-standard extension (e.g. adding an
ops_location_field
tostops_supplement.txt
) that conflicts with a future GTFS change, but I think I'd generally advise against any such additional fields in general.This approach would allow us to eliminate all of the definitions and conditional requirement fields that add to the complexity of some of the supplemental fields by placing all trips, stops, and stop_times in a single merged datasource.
Validation between internal and external fields becomes easy in that one can easily verify that all IDs listed are unique.
The learning curve for individuals looking to implement ODS becomes lower, as it's simply modeling the internal trips in a supplemental file vs calling otherwise analogous items by distinct names depending on the context.
I realize approach 4 would be a relatively major/breaking change to the standard, which I'd normally reject, but it might be worth considering since (a) the standard is not yet widely adopted, and (b) those who have implemented the standard would only need to change file/column labels vs adding new logic (some of which — e.g. the comparative field ID values — could be eliminated entirely, thereby further reducing complexity and reducing the barrier for implementation).
Curious for everyone's thoughts/input on this, as I not only see this being useful in the context of modeling runs (and required to do so for our operations), but also see it being valuable for helping grow support for the standard as many operations that may not yet be ready to implement full run-modeling in ODS could have a use case for modeling deadheads and trips with internal locations (e.g. Rail AVL systems where we don't care who's working the trip, we just care about the trip, its waypoint times, its cycle, etc. but have been unable to use GTFS given the need to combine the two elements). That could further solidify the standard's role in the industry and help grow support for widespread adoption.
[^1]: We could also consider an optional
_NEW
field suffix for changing a field's PK value, but I am disinclined to do so as (1) I don't foresee there being a compelling need, and (2) there is a major risk of downstream propagation issues by implementing a PK change.