Closed skyqrose closed 4 months ago
This largely looks good to me! Three suggested tweaks, and one comment:
I'm not a fan of making run_event_id a required value since most operations would either be using an internal value from their scheduling system that'd never be referenced, or making up values on-the-fly that could easily get confused with other values. I think we're much better off making it an optional value as a result, and clarifying that this column must be dataset unique.
start_mid_route
and end_mid_route
should be renamed to either start_en_route
and end_en_route
or start_mid_trip
and end_mid_trip
to clarify that these need not be in the middle of a route, but rather the middle of a trip.
We should clarify that all start/end times and locations are based on the supplemented entries (per https://github.com/cal-itp/operational-data-standard/issues/55), not just the entries in the base GTFS. Thus, if a different stop_id, time, or value is used, or if a different row entirely is the first/last stop per the supplemented entry data, the values in here that reference GTFS should be matched/based on the values after their modification from the respective base/public GTFS.
From the producer side, I can't think of a scenario in which I'd ever want to have planned overlapping events, and I could see several consumers having issues processing overlapping events, hence I could see some issue with such permission. That said, this would not affect my realm, so this is more of a flag for input from others than something that would prevent my support.
Thanks!
_supplement
files are used, not in every place that an ODS file has a reference.The data I've used for MBTA has an "Operator" event that lasts for a whole piece, and overlaps with every trip on that piece. That event could be removed, so it's not a big deal. But an agency could use this to represent other whole-shift labels, like which part of the day an employee is getting paid for.
And if someone hypothetically really does have overlapping responsibilities, it'll be way better to represent them as two separate events as "job A" and "job B", instead of one combined "job A+B" event, which would break querying the data for anybody doing "job B".
It does mean that consumers need to understand the meaning of event_types in order to detangle overlapping events, but consumers have to know the meaning of event_types anyway to do anything useful with them.
I think it's important for flexibility in producers being able to represent their data in an accurate way, and is worth the added small complexity for consumers. But also I have a producer's point of view, so I'd love to hear others' opinions on this.
Regarding (1), the run_event_id
question, the main uses for the field I see are:
trip_id
.I think I'm in agreement that it's fine to not require run_event_id
, or at least not for rows with a non-null trip_id
, though I'd love to hear from any consumers on the matter!
It seems like there could end up being 2 primary keys here, run_id and run_event_id
. Can we just use run_id
? I think it would be handy to also have a sequence field.
Also, it isn't clear to me why we need start_mid_trip
and end_mid_trip
if we always give the location at which we start or end.
mid_route
:
I guess start/end_mid_route
aren't necessary, and you could determine if it's mid_trip by comparing the location/time to the first/last of the trip's stop_times. But mid_route events are important, and comparing through trips.txt and stop_times.txt is hard. The field would concretely be useful at MBTA for handling bus operator schedules where we frequently have mid-route swing ons.
Primary Key:
I do think that some sort of id is important, I expect to want to have references into this file, and if the Primary Key is *
, then that's impossible, you'd have to have all the columns.
Instead of run_event_id
or *
, the Primary Key could be (service_id
, run_id
, start_time
, event_type
). An employee could have multiple responsibilities at the same time, but probably wouldn't start the same event twice at the same time? I don't quite like this because using event_type
as part of an id makes it less free as a free text field. If some agency ever does represent their data with two events at the same time, then the event_type
can't be both unique and consistent.
run_id
can't be a primary key on its own because one run has multiple rows.
A sequence field couldn't guaranteed be sequential because rows can overlap in time, and isn't needed for sequencing because consumers can work with the times instead. But maybe a sequence-ish field that's unique within a run would be useful for providing an order and an id? What do people think about this:
Field name | Type | Required | Description |
---|---|---|---|
event_sequence |
Non-negative integer | Required | The order of this event within a run. Unique within the run. Note that events may overlap in time. If Event A and Event B are on the same `service_id` and `run_id`, and Event A has a `start_time` before Event B, then Event A's `event_sequence` should be less than Event B's. If Event A and B have the same `start_time`, but Event A has an `end_time` before Event B, then event A's `event_sequence` should be less than event B's. If Event A and B have the same `start_time` and `end_time`, then their `event_sequence` values can be in either order, but they must be different. |
Primary Key: (service_id
, run_id
, event_sequence
) (this is also the recommended sort order of the file).
Closing this issue since it's completely covered by #66 . Further discussion should happen there.
This is a combination of #51, #52, and #54, updated based on discussion and assuming some form of #55 is accepted.
I changed the name from #51, I now propose
run_events.txt
notruns.txt
because the file has one row per event, not one row per run, and it merges in the existingrun_events.txt
table.This is a concrete proposal, meant to close issues rather than open them. It's intended to be able to be accepted as is, without any open questions or TODOs. Though, of course, I expect there to be discussion and minor changes. I will edit the proposal with any changes.
Summary:
run_events.txt
, which lists all of a run's trips, deadheads, and events.trip_id
field is set.runs_pieces.txt
andrun_events.txt
files. Those files would be removed. (deadheads.txt
would also be removed by #55.)Full Documentation:
Primary Key: (
service_id
,run_id
,event_sequence
)service_id
calendar.service_id
run_id
event_sequence
service_id
,run_id
). It's required and unique so it can be used in the Primary Key to uniquely identify events. Note that events may overlap in time. If they do, it may not be possible to define a single ordering that's correct for all uses. This column provides one consistent ordering. If a consumer cares about how overlapping events are ordered, they should sort based on the time fields andevent_type
. If Event A and Event B are on the same `service_id` and `run_id`, and Event A has a `start_time` before Event B, then Event A's `event_sequence` should be less than Event B's. If Event A and B have the same `start_time`, but Event A has an `end_time` before Event B, then event A's `event_sequence` should be less than event B's. If Event A and B have the same `start_time` and `end_time`, then their `event_sequence` values can be in either order, but they must be different. Values do not have to be consecutive.piece_id
block_id
trips.block_id
block_id
exists,trip_id
exists, and that trip's entry intrips.txt
has ablock_id
, then the twoblock_id
s must match. May exist even iftrip_id
does not (e.g. if an event represents a run-as-directed block with no scheduled trips).job_type
job_type
throughout the day if the employee has multiple responsibilities, e.g. an "Operator" in the morning and a "Shifter" in the afternoon.event_type
event_type
that they don't recognize.run_events.event_type
, which was a numeric enum with specific supported values. We could consider publishing a list of standard values to use here, for common activities such as "Sign-in", "Operator", and "Break", but producers should be able to use arbitrary values in addition to standard values. The field isText
rather thanID
orEnum
so that even if consumers don't understand the meaning of a specificevent_type
, they can still display it.trip_id
trips.trip_id
start_location
stops.stop_id
trip_id
is set (andmid_trip_start
is not1
), this should be the first stop of the trip. Ifstart_mid_trip
is1
, this should instead be the location where the employee starts, in the middle of the trip.start_time
trip_id
is set (andmid_trip_start
is not1
), this should be the time of the first stop of the trip. Ifstart_mid_trip
is1
, this should instead be the time when the employee starts, in the middle of the trip.start_mid_trip
trip_id
is not set.end_location
stops.stop_id
trip_id
is set (andmid_trip_end
is not1
), this should be the last stop of the trip. Ifend_mid_trip
is1
, this should instead be the location where the employee ends, in the middle of the trip.end_time
trip_id
is set (andmid_trip_end
is not1
), this should be the time of the last stop of the trip. Ifend_mid_trip
is1
, this should instead be the time when the employee ends, in the middle of the trip. Must be greater than or equal tostart_time
minimum_duration
field or something like it, that would be in addition to this field.end_mid_trip
trip_id
is not set.run_event
s can refer to the sametrip_id
, if multiple employees work on that trip.start_time
may equalend_time
for an event that's a single point in time (such as a report time) without any duration.service_id
,run_id
,event_sequence
.Examples
Single Run with Multiple Pieces and Pre-trip inspection
Multiple Runs with Mid-Trip Relief
Two-car MBTA Green Line train with an operator for each car. The
event_type
field distinguishes whether an operator is in the front car or the rear car. The operators swap for the return trip.Edit history
start/end_mid_route
tostart/end_mid_trip
.end_time
must be >=start_time
.run_event_id
withevent_sequence
. Updated Primary Key and examples.