Closed skyqrose closed 6 months ago
Hi Sky,
Thank you for this incredibly detailed and well-documented proposal!
It took me a couple of tries to wrap my head around it, but if I'm reading correctly, the tl;dr is replacing runs_pieces.txt
— which is inherently implicit about trips and their sequencing — with an explicit runs.txt
file that enumerates the individual activities of a run.
I like this concept and practice and support the idea in general.
runs.txt
FeedbackThe term "piece" as used during the working group discussions was synonymous with the use in many scheduling systems, being the start and end of a portion of work on a given block.
If we're going to enumerate everything, I'd advocate for merging run_events.txt
into runs.txt
, too.
runs.txt
effectively becomes a file that says, "you do x, from this place at this time, to this place at this time." "x" then is either a trip, deadhead, or event. The only thing missing, then, is a name and type of the event, which we could address by merging the event_type
enums in both run_pieces.txt
and run_events.txt
, and a row_description
column to offer not only a textual label for events, but to add to run content as individual operators may see fit.Alongside the standardization of everything into the explicit, I think this would also warrant changing the run_row
terminology in this proposal to a run_event
, since working a trip or deadhead is simply a special type of an event itself (further supported by the above).
runs.txt
ExampleTo give things a concrete example from https://ods.calitp.org/spec/examples/multiple-runs-single-block-midtrip-relief/, we're effectively replacing
run_id,piece_id,start_type,start_trip_id,start_trip_position,end_type,end_trip_id,end_trip_position
10000,10000-1,0,daily-deadhead-1,,1,102,mid_relief_stop
20000,20000-1,1,103,mid_relief_stop,0,daily-deadhead-2,
with
service_id,run_id,piece_id,block_id,run_event_type,run_event_id,run_event_start_time,run_event_start_location,run_event_start_mid_trip,run_event_end_time,run_event_end_location,run_event_end_mid_trip,event_desc
daily,10000,10000-1,BLOCK-A,1,daily-deadhead-1,08:00:00,Yard,0,08:30:00,FirstStop,0
daily,10000,10000-1,BLOCK-A,1,101,08:45:00,,,08:30:00,,
daily,10000,10000-1,BLOCK-A,1,102,,,,,mid_relief_stop,1,
daily,10000,,,5,,12:00:00,,,13:00:00,,,Lunch
daily,20000,20000-1,BLOCK-B,1,daily-deadhead-2,,mid_relief_stop,1,,,
(added a sample Lunch event as an example).
That does seem to make the lives of consumers and data analysts easier, rather than relying on the implicit calculation that many scheduling systems currently use.
deadheads.txt
FeedbackAs for deadheads.txt
:
Deadheads need to retain sequencing since trips/deadheads are not always synonymous with employees, nor are blocks always defined (e.g. while a person may work DH2 after they work trip 100, DH2 may follow trip 200 on a given vehicle; the from/to fields are the only mechanism for defining the 200-DH2 link).
Deadheads need to have start/end times added both to support where trips are supposed to go and to mesh with the runs.txt
format (even if these are flexible times akin to the separate duration discussion we were having elsewhere, having a baseline for times is still important).
There are some other more generic things that I plan to raise in a forthcoming GH problem regarding applicability of the standard to passenger rail operations (which has complexities of trips with both revenue and deadhead components, many-to-many mappings of employees, trips, and parts thereof, etc.), but the modifications above would further support those elements with field extensions (e.g. adding a event_type
enum field to specify whether the employee working on a given trip is working as a Locomotive Engineer, Conductor, Assistant Conductor, etc.).
That's a good summary, thanks.
runs.txt/run_events.txt
:
I hadn't considered merging run_events
into runs.txt
. Removing another file and a ton of duplicate rows between runs
and run_events
would be pretty good.
Thanks for the example. Some small corrections to the example: run_event_type
would be 0
for the deadhead, and it's missing some required locations+times.
Looking through run_events
made me realize a potential error in the proposal: If ODS location ID and GTFS stop id can't be mixed into the same column, then runs.txt
might need additional columns start_location_type
and end_location_type
like run_events.txt
currently has (or separate start_stop_id
and start_ops_location_id
fields like deadhead_times.txt
currently has).
I'm not sure about separate or merged row_type
and event_type
fields. In the original proposal, runs.txt:run_row_type
is about what kind of data you're looking at and would be used for control flow of which other table to look into. Any new value there would be part of a major spec change. run_events.txt:event_type
is just data, a machine-readable description field, and doesn't control anything, and could have new values added frequently. So they could be merged but when handling the data they'd be used so differently, that maybe they shouldn't be.
deadheads.txt
:
from_trip_id
field to link deadheads and trips be easier than using blocks?deadhead_times.txt
instead of deadheads.txt
, similar to how it's done in GTFS trips.txt
.I've opened a new issue that takes into account all the discussion from here, and is built on top of #55
https://github.com/cal-itp/operational-data-standard/issues/60
Background / Existing problems
There are several oversights in the existing ODS spec that make it impossible for MBTA to represent our schedule:
The existing
runs_pieces.txt
file does not provide a link between a run/piece and all its associated trips, deadheads, and run events.run_id
orpiece_id
, which trips, deadheads, and events are on that run?run_pieces.start_trip_id
andrun_pieces.end_trip_id
to see which block is associated with a piece, but this is cumbersome.trip_id
ordeadhead_id
, what run/piece is it on?run_pieces.txt
.runs_pieces.start_trip_id
andruns_pieces.end_trip_id
may be null if a run piece both begins and ends with an event, making it impossible to use them to match a run to a block.run_events.txt
uses apiece_id
field to make the association easy, but this field could not be added totrips.txt
in GTFS, so we need a new way in ODS to make the connection.The current
runs_pieces.txt
also does not allow for the representation of a piece that consists only of events, such as extraboard (aka, cover) or other run-as-directed work, becauserun_pieces.start_trip_id
andrun_pieces.end_trip_id
are required fields and do not allow for run events.Finally, the current specification does not allow for non-unique run or piece identifiers, even as many agencies may reuse run "numbers" between divisions or day-types.
A new file,
runs.txt
would address all of these problems.new file:
runs.txt
Primary key:
(service_id, run_id, run_row_type, run_row_id)
This proposal uses
(service_id, run_id)
as a pair to solve the run-uniqueness problem.Description
Lists all of the trips, deadheads, and events associated with each run and piece in a many-to-one relationship.
The start/end time/location of each run row are denormalized from
trips.txt
/stop_times.txt
/deadheads.txt
/deadhead_times.txt
/runs_events.txt
. They're needed here because knowing when and where someone is working is important, and checking any of three other files for it is too hard. It's also needed to show where within a mid-trip relief the relief is.The
mid_trip
flag is1
for trips with mid-trip relief. It means the start/end time/location are for this operator's work, not the start/end of the trip. The flag could be set to0
to say "there's no mid-trip relief, and the operator's work in this file corresponds to the trip ends instop_times.txt
ordeadhead_times.txt
". Or it could be left blank. It should be blank for all events (type2
).The times don't have to fit perfectly together, e.g. for a layover. The employee is considered to be on their run/piece between the
start_time
of the earliest row on that run/piece and theend_time
of the last row. It's allowed for times to overlap, in the case that there's arun_event
for a task that an employee does concurrently with driving.service_id
calendar.service_id
run_id
piece_id
block_id
deadheads.block_id
ortrips.block_id
trips.txt
ordeadheads.txt
. If populated, this value must match that intrips.txt
ordeadheads.txt
, for the giventrip_id
ordeadhead_id
.run_row_type
run_row_id
deadheads.deadhead_id
ortrips.trip_id
orrun_events.run_event_id
run_row_start_time
run_row_start_location
deadheads.deadhead_id
ortrips.trip_id
orrun_events.event_from_location_id
run_row_start_mid_trip
run_row_end_time
run_row_end_location
deadheads.deadhead_id
ortrips.trip_id
orrun_events.event_to_location_id
run_row_end_mid_trip
Question: Should the start/end time/location fields be required instead of conditionally required? It would make consuming easier to be able to rely on their presence, but could make producing more complex for agencies that don't use mid-trip reliefs.
remove file or add column:
runs_pieces.txt
Option A: _Remove
runs_pieces.txt
_All the information in
runs_pieces.txt
is now redundant with the information inruns.txt
. We propose removing the file.Option B: _Add column to
runs_pieces.txt
_The file could be kept if:
runs.txt
can't do since it has one row per trip/deadhead/event, instead of one row per run/piece).runs.txt
, but could be done with just one row if those fields were added toruns_pieces.txt
.If the file is kept, we propose adding new field,
service_id
, to solve therun_id
uniqueness problem:service_id
calendar.service_id
Adding this required field is still a breaking change, just a smaller one. (Though the breaking change could potentially be avoided with the
run_code
alternative below.)We may also want to consider changing the start/end fields to better line up with
runs.txt
's start/end fields and make it clearer how to handle pieces that start with events, but I don't have a specific proposal for how to do that.remove columns: deadheads.txt
Consider removing fields
to_trip_id
,from_trip_id
,to_deadhead_id
,from_deadhead_id
.This change isn't needed, but if we're making backwards incompatible changes anyway, this would clean things up and make the spec a little more cohesive.
These fields were originally added as a way to link deadheads to other trips on the run/piece/block. But
runs.txt
now provides a better way to find the order of trips and deadheads with a run. Also, the spec currently has some ambiguities around these fields. As a producer, it would be easier to remove these fields than to populate them.These fields could be kept anyway if consumers find them useful, or if we want to minimize the number of breaking changes. If they are kept, they should all be made optional, as not every deadhead will have a previous/next trip.
[Note: We also propose other unrelated changes to
deadheads.txt
in Proposal 2.]Non-recommended option:
run_code/piece_code
An alternative solution that we considered for the run uniqueness problem, would be to add new String fields
run_code
andpiece_code
toruns.txt
and/orruns_pieces.txt
. Our human-readable non-unique run ids would go as strings in these fields, andrun_id
would have to be a long unique ID. (The MBTA would probably use something like${service_id}-${division_id}-${run_id}
).This solves the uniqueness problem, so the new
service_id
field inruns_pieces.txt
would not be required. If done just right, this could potentially make the whole proposal backwards-compatible.However, I think it's better to keep using non-unique
run_id
s with aservice_id
field because:block_id
is not unique, and it's basically the equal counterpart ofrun_id
.service_id
is a useful field to have in most ODS files anyway, so you can more easily query for data by date.Questions for review:
runs.txt
file okay? Is there anything in your agency it wouldn't be able to represent, and would it be easy for parties to produce as well as consume?runs_pieces.txt
okay, or should we keep it?