cal-itp / operational-data-standard

The Transit Operational Data Standard is an open standard for representing the transit schedules used by drivers, dispatchers, and planners to carry out transit operations.
https://ods.calitp.org
Apache License 2.0
26 stars 6 forks source link

new file runs.txt, and associated changes #51

Closed skyqrose closed 6 months ago

skyqrose commented 9 months ago

Background / Existing problems

There are several oversights in the existing ODS spec that make it impossible for MBTA to represent our schedule:

The existing runs_pieces.txt file does not provide a link between a run/piece and all its associated trips, deadheads, and run events.

The current runs_pieces.txt also does not allow for the representation of a piece that consists only of events, such as extraboard (aka, cover) or other run-as-directed work, because run_pieces.start_trip_id and run_pieces.end_trip_id are required fields and do not allow for run events.

Finally, the current specification does not allow for non-unique run or piece identifiers, even as many agencies may reuse run "numbers" between divisions or day-types.

A new file, runs.txt would address all of these problems.

new file: runs.txt

Primary key: (service_id, run_id, run_row_type, run_row_id)

This proposal uses (service_id, run_id) as a pair to solve the run-uniqueness problem.

Description

Lists all of the trips, deadheads, and events associated with each run and piece in a many-to-one relationship.

The start/end time/location of each run row are denormalized from trips.txt/stop_times.txt/deadheads.txt/deadhead_times.txt/runs_events.txt. They're needed here because knowing when and where someone is working is important, and checking any of three other files for it is too hard. It's also needed to show where within a mid-trip relief the relief is.

The mid_trip flag is 1 for trips with mid-trip relief. It means the start/end time/location are for this operator's work, not the start/end of the trip. The flag could be set to 0 to say "there's no mid-trip relief, and the operator's work in this file corresponds to the trip ends in stop_times.txt or deadhead_times.txt". Or it could be left blank. It should be blank for all events (type 2).

The times don't have to fit perfectly together, e.g. for a layover. The employee is considered to be on their run/piece between the start_time of the earliest row on that run/piece and the end_time of the last row. It's allowed for times to overlap, in the case that there's a run_event for a task that an employee does concurrently with driving.

Field name Type Required Description
service_id ID referencing calendar.service_id Required Identifies a set of dates when the run is scheduled to take place.
run_id ID Required
piece_id ID Optional Identifies the piece during which the run row takes place. May be left null for rows that take place outside of a piece, such as a break. _\[Note: Only matters if allowed in [Proposal 2](https://github.com/cal-itp/operational-data-standard/issues/52)\]_
block_id ID referencing deadheads.block_id or trips.block_id Optional Identifies the block to which the run row belongs. If omitted, this may be derived from trips.txt or deadheads.txt. If populated, this value must match that in trips.txt or deadheads.txt, for the given trip_id or deadhead_id.
run_row_type Enum Required Indicates whether the run row consists of a deadhead, a revenue trip, or an event. 0 - Deadhead 1 - Trip 2 - Event
run_row_id ID referencing deadheads.deadhead_id or trips.trip_id or run_events.run_event_id Required Identifies the specific deadhead, trip, or event associated with the run row.
run_row_start_time Time Conditionally required Identifies the time at which the run piece begins to be associated with the row's deadhead, trip, or event. Required if `run_row_start_mid_trip` is 1. Recommended otherwise.
run_row_start_location ID referencing deadheads.deadhead_id or trips.trip_id or run_events.event_from_location_id Conditionally required Identifies the first operational location or stop to be serviced by the run row. Required if `run_row_start_mid_trip` is 1. Recommended otherwise.
run_row_start_mid_trip Enum Conditionally required Indicates whether the run piece begins the deadhead or trip at the start or middle of the respective deadhead or trip. 0 (or blank) - Row does not start mid-trip or mid-deadhead 1 - Row starts mid-trip or mid-deadhead Required if the run row begins with a mid-trip relief. Optional otherwise.
run_row_end_time Time Conditionally required Identifies the time at which the run piece is finished being associated with the row's deadhead, trip, or event. Required if `run_row_end_mid_trip` is 1. Recommended otherwise.
run_row_end_location ID referencing deadheads.deadhead_id or trips.trip_id or run_events.event_to_location_id Conditionally required Identifies the last operational location or stop to be serviced by the run row. Required if `run_row_end_mid_trip` is 1. Recommended otherwise.
run_row_end_mid_trip Enum Required Indicates whether the run piece ends the deadhead and trip at the end or middle of the respective deadhead or trip. Used to denote mid-trip reliefs. 0 (or blank) - Row does not end mid-trip or mid-deadhead 1 - Row ends mid-trip or mid-deadhead Required if the run row ends with a mid-trip relief. Optional otherwise.

Question: Should the start/end time/location fields be required instead of conditionally required? It would make consuming easier to be able to rely on their presence, but could make producing more complex for agencies that don't use mid-trip reliefs.

remove file or add column: runs_pieces.txt

Option A: _Remove runs_pieces.txt_

All the information in runs_pieces.txt is now redundant with the information in runs.txt. We propose removing the file.

Option B: _Add column to runs_pieces.txt_

The file could be kept if:

If the file is kept, we propose adding new field, service_id, to solve the run_id uniqueness problem:

Field name Type Required Description
service_id ID referencing calendar.service_id Required Identifies a set of dates when the run is scheduled to take place.

Adding this required field is still a breaking change, just a smaller one. (Though the breaking change could potentially be avoided with the run_code alternative below.)

We may also want to consider changing the start/end fields to better line up with runs.txt's start/end fields and make it clearer how to handle pieces that start with events, but I don't have a specific proposal for how to do that.

remove columns: deadheads.txt

Consider removing fields to_trip_id, from_trip_id, to_deadhead_id, from_deadhead_id.

This change isn't needed, but if we're making backwards incompatible changes anyway, this would clean things up and make the spec a little more cohesive.

These fields were originally added as a way to link deadheads to other trips on the run/piece/block. But runs.txt now provides a better way to find the order of trips and deadheads with a run. Also, the spec currently has some ambiguities around these fields. As a producer, it would be easier to remove these fields than to populate them.

These fields could be kept anyway if consumers find them useful, or if we want to minimize the number of breaking changes. If they are kept, they should all be made optional, as not every deadhead will have a previous/next trip.

[Note: We also propose other unrelated changes to deadheads.txt in Proposal 2.]

Non-recommended option: run_code/piece_code

An alternative solution that we considered for the run uniqueness problem, would be to add new String fields run_code and piece_code to runs.txt and/or runs_pieces.txt. Our human-readable non-unique run ids would go as strings in these fields, and run_id would have to be a long unique ID. (The MBTA would probably use something like ${service_id}-${division_id}-${run_id}).

This solves the uniqueness problem, so the new service_id field in runs_pieces.txt would not be required. If done just right, this could potentially make the whole proposal backwards-compatible.

However, I think it's better to keep using non-unique run_ids with a service_id field because:

Questions for review:

jeffkessler-keolis commented 8 months ago

Hi Sky,

Thank you for this incredibly detailed and well-documented proposal!

It took me a couple of tries to wrap my head around it, but if I'm reading correctly, the tl;dr is replacing runs_pieces.txt — which is inherently implicit about trips and their sequencing — with an explicit runs.txt file that enumerates the individual activities of a run.

I like this concept and practice and support the idea in general.

runs.txt Feedback

runs.txt Example

To give things a concrete example from https://ods.calitp.org/spec/examples/multiple-runs-single-block-midtrip-relief/, we're effectively replacing

run_id,piece_id,start_type,start_trip_id,start_trip_position,end_type,end_trip_id,end_trip_position
10000,10000-1,0,daily-deadhead-1,,1,102,mid_relief_stop
20000,20000-1,1,103,mid_relief_stop,0,daily-deadhead-2,

with

service_id,run_id,piece_id,block_id,run_event_type,run_event_id,run_event_start_time,run_event_start_location,run_event_start_mid_trip,run_event_end_time,run_event_end_location,run_event_end_mid_trip,event_desc
daily,10000,10000-1,BLOCK-A,1,daily-deadhead-1,08:00:00,Yard,0,08:30:00,FirstStop,0
daily,10000,10000-1,BLOCK-A,1,101,08:45:00,,,08:30:00,,
daily,10000,10000-1,BLOCK-A,1,102,,,,,mid_relief_stop,1,
daily,10000,,,5,,12:00:00,,,13:00:00,,,Lunch
daily,20000,20000-1,BLOCK-B,1,daily-deadhead-2,,mid_relief_stop,1,,,

(added a sample Lunch event as an example).

That does seem to make the lives of consumers and data analysts easier, rather than relying on the implicit calculation that many scheduling systems currently use.

deadheads.txt Feedback

As for deadheads.txt:

Other

There are some other more generic things that I plan to raise in a forthcoming GH problem regarding applicability of the standard to passenger rail operations (which has complexities of trips with both revenue and deadhead components, many-to-many mappings of employees, trips, and parts thereof, etc.), but the modifications above would further support those elements with field extensions (e.g. adding a event_type enum field to specify whether the employee working on a given trip is working as a Locomotive Engineer, Conductor, Assistant Conductor, etc.).

skyqrose commented 8 months ago

That's a good summary, thanks.

runs.txt/run_events.txt:

I hadn't considered merging run_events into runs.txt. Removing another file and a ton of duplicate rows between runs and run_events would be pretty good.

Thanks for the example. Some small corrections to the example: run_event_type would be 0 for the deadhead, and it's missing some required locations+times.

Looking through run_events made me realize a potential error in the proposal: If ODS location ID and GTFS stop id can't be mixed into the same column, then runs.txt might need additional columns start_location_type and end_location_type like run_events.txt currently has (or separate start_stop_id and start_ops_location_id fields like deadhead_times.txt currently has).

I'm not sure about separate or merged row_type and event_type fields. In the original proposal, runs.txt:run_row_type is about what kind of data you're looking at and would be used for control flow of which other table to look into. Any new value there would be part of a major spec change. run_events.txt:event_type is just data, a machine-readable description field, and doesn't control anything, and could have new values added frequently. So they could be merged but when handling the data they'd be used so differently, that maybe they shouldn't be.

deadheads.txt:

skyqrose commented 6 months ago

I've opened a new issue that takes into account all the discussion from here, and is built on top of #55

https://github.com/cal-itp/operational-data-standard/issues/60