hodb commented 1 year ago

In reviewing the ODS spec, it appears that there is an absence of guidance regarding the allocation of blocks to operational vehicles, as well as the assignment of duties to drivers with their matching operated dates

As this crucial information is expected to be sourced from the dispatching software, can you provide clarification or additional details on how the ODS spec addresses the extraction of data related to block allocation, driver duties, and associated dates from the dispatching software for communication to the AVL provider?

Thanks

skyqrose commented 1 year ago

Matching operators to their duties on each day is something that I would use, and if it's not in ODS, I'd have to represent the data in a non-standard adjacent data feed. I'm currently using a CSV with 3 columns for this purpose: date, run_id, operator.

But it's also kind of separate from the schedules ODS does represent. For example, assigning operators to runs is done well after the rest of the schedule is finalized.

I think I remember this coming up in discussions a while ago and I think it was just cut for scope. It'd be easy to add as a backwards-compatible extension once the rest of the format is finalized.

hodb commented 1 year ago

Thanks for your answer @skyqrose. The concept I aim for is pretty much the same - having a new file driver_assignment.txt that includes the assignment per driver - run_id,driver_id,operational_date

And a vehicle_assignment.txt file that includes assignment per vehicle block_id,vehicle_id,operational_date

safrazier17 commented 11 months ago

I think I remember this coming up in discussions a while ago and I think it was just cut for scope. It'd be easy to add as a backwards-compatible extension once the rest of the format is finalized.

Yes, this was discussed and cut for scope and also because there seemed to be less consensus on the topic of representing driver/staff-identifying information, at least at that time. I remain in favor of bringing this information into ODS, however.

For example, assigning operators to runs is done well after the rest of the schedule is finalized.

Can we be clear about what the intended use case is here? I want to make sure I understand who is producing these assignments and who the consumer will be. If the run assignments are done via the same producer after the schedule is finalized but before it would be sent to the schedule consumer, that wouldn't seem like it would impact our decision to include those assignments in the ODS-represented schedule. But I want to be sure that's an accurate read of what you both have said.

skyqrose commented 11 months ago

At MBTA, the use case is to export data from HASTUS and distribute it to multiple internal apps using a standard format, instead of the status quo of each app getting a custom export in a custom format. (This data would probably also make it outside of MBTA to Swiftly, but that's not my side of the house so I'm not sure.)

So far, exporting the trip schedules and exporting the operators' schedules have been completely separate processes, and exporting operators' schedules for each season typically happens a couple weeks after the trips. If we didn't have this proposal, we would keep doing the separate operator export alongside ODS. If we did, then I think we would publish an internal ODS feed without operator schedules as soon as the trips are scheduled, and then re-publish an updated ODS feed with operator schedules once those become available.

Does that answer the question?

skyqrose commented 11 months ago

Depending on how the issues with non-unique run_ids go (see #12), it's possible that the drivers file would need to have a 4th column for the service_id. If runs are unique among all the services effective on a date (but not unique among all services), it wouldn't be necessary, but could still make looking up the run easier.

MBTA hasn't run into this specific uniqueness problem yet with our current use cases, but I expect we would if this file was standardized and we used it in more places.

I'm not sure if our block_ids are unique, but I guess the same problem could happen in principle to the vehicles file.

hodb commented 11 months ago

Can we be clear about what the intended use case is here? I want to make sure I understand who is producing these assignments and who the consumer will be. If the run assignments are done via the same producer after the schedule is finalized but before it would be sent to the schedule consumer, that wouldn't seem like it would impact our decision to include those assignments in the ODS-represented schedule. But I want to be sure that's an accurate read of what you both have said.

Adding on top of @skyqrose's answer. The dispatching software and the AVL vendor might be different. This creates a scenario where we are unable to communicate through the same format and are required to examine each vendor's unique specification which includes different data elements. From the customer's end, if they're using two different systems in place, they are required to manually do the allocations in both the dispatching and the AVL software, which can be inefficient. If we don't include the actual assignments in the ODS format, we would be able to convey information about the runs/blocks but not the dispatching portion. Extending ODS for including assignments, creates a clear path for clear communication between two different software providers

safrazier17 commented 11 months ago

Question re: proposed driver_assignment.txt file

Would it be reasonable to say this same use applies to personnel more broadly (that is to say, not just to drivers/vehicle operators)? Does the job performed need to be identified in the new field? Or will it (always) be self-evident from the operator id what job they are performing?

Trying to get at whether we need to make our language more job-agnostic and how we can either support other roles or at least bake in some extensibility for that support in future

safrazier17 commented 11 months ago

As proposed, would we be imposing a limitation that a driver's run assignment would always be for the full run. Is that desirable? Would it make sense to include the piece information in the file as well?

safrazier17 commented 11 months ago

Some questions on proposed vehicle_assignment.txt:

are there different types of assignment we need to account for?
how do we account for the same vehicle being assigned to multiple blocks on a given date? or the same block assigned to multiple vehicles?
is there a case where a block might be assigned no vehicle within the schedule, and instead only assigned on the day of operations?
do we need a master list of vehicles that vehicle_id can pull from? [we also have #30, which currently doesn't have a vehicle_id field but should.]

one more q for driver_assignment.txt

would it make sense for this assignment to be made between an operator and a bid instead to individual runs? with the bid representing a package of runs/pieces during a specific window of time?

skyqrose commented 11 months ago

I think that this file could apply to other jobs, if they have run ids, which seems very possible.

You can't determine job based on operator id, since some employees might do different jobs on different days. But you can determine the job based on run id. And if you want a run_id to job_id mapping, then that'd be better to do in runs.txt than in driver_assignment.txt. So I don't think it should go here, but we should consider it there.

Edit: On second thought maybe run_id isn't enough, because an employee could do multiple jobs within the same run, e.g. operator in the morning, then shifter in evening, so it'd have to go in runs.txt.

Yes, this would limit assignments to a full run. I think by the definition of a run, that's okay. If you plan to assign multiple operators to different pieces of the same run, then those are really two separate runs.

If you don't plan to assign multiple operators, but end up having to because you're filling absences or if an operator has a half-day conflict or something, then that's not necessarily going to align to piece boundaries, and it'd require a way to specify assignments per trip, which is way more detail than ODS should have and is a step away from schedule data towards realtime data so is out of scope anyway.

(no comment on vehicles)

I don't think we should associate operators to bids instead of runs+dates. If we did, we'd need a separate way to map bids to runs to runs+dates, and every lookup would have to go through that level of indirection. It'd also make it impossible to reassign people on individual days, for example if vacation dates are chosen after bidding. The important operational detail we want to capture is which employees are working, and the bidding process is just an administrative detail to get there that doesn't matter if we have the end result.

hodb commented 11 months ago

Would it be reasonable to say this same use applies to personnel more broadly (that is to say, not just to drivers/vehicle operators)? Does the job performed need to be identified in the new field? Or will it (always) be self-evident from the operator id what job they are performing?

Trying to get at whether we need to make our language more job-agnostic and how we can either support other roles or at least bake in some extensibility for that support in future

It could be extended, but what is the use case for associating another personnel that is not performing a run?

As proposed, would we be imposing a limitation that a driver's run assignment would always be for the full run. Is that desirable? Would it make sense to include the piece information in the file as well?

It is fair to say that a duty/run can be "cut" and split between different drivers. In that case, we would need to have some unique identifier on how many pieces the duty is split between drivers and on what date (which we have already). Since a duty might repeat itself every day but on a different date. For instance - 1/3, 2/3, 3/3).

Some questions on proposed vehicle_assignment.txt:

are there different types of assignments we need to account for?

how do we account for the same vehicle being assigned to multiple blocks on a given date? or the same block assigned to multiple vehicles?

is there a case where a block might be assigned no vehicle within the schedule, and instead only assigned on the day of operations?

do we need a master list of vehicles that vehicle_id can pull from? [we also have add Vehicles.txt #30, which currently doesn't have a vehicle_id field but should.]

one more q for driver_assignment.txt

would it make sense for this assignment to be made between an operator and a bid instead to individual runs? with the bid representing a package of runs/pieces during a specific window of time?

I don't think we should have additional assignments for vehicles, but it depends also on the receiving end (the AVL) to elaborate on whether there is another use case from their side.
We can add a unique identifier to the number of pieces a block is worked on by different vehicles (same as proposed for the duties).
I assume no vehicle assigned is possible, due to maintenance and such. In that case, we would need a real-time aspect to inform the AVL of the change in allocation (this goes a bit further)
I proposed the vehicles.txt as an option for the AVL provider to be in sync with the information the dispatching software has. The advantage of using this file is to have a complete sync between the list of vehicles the dispatching software holds and the AVL provider. How often is a vehicle being added or removed from the fleet? on the same note, perhaps a drivers.txt file would also be valuable

safrazier17 commented 11 months ago

Separating out the different discussion threads here:

[x] Driver vs job-agnostic personnel assignments
Rows in this file could be applicable to any personnel that has a run assignment. This will often, but not always, be an operator.
@skyqrose has provided examples of multi-personnel trips/runs in #54
Proposal in that issue is to designate job performed within runs.txt rather than in driver_assignment.txt
Further discussion on this can go to #54
[ ] During scheduling, will a driver always be assigned to a full run?
Difference of opinion here:
- @skyqrose says yes, scheduled runs should be assigned to 0 to 1 personnel
- @hodb says it is possible that a run would be cut in such a way that we need to allow for 0, 1 or many personnel assignments
- ultimately I believe this is a question of how we approach the normative power of a data standard. We can (1) normatively state that a single run is assigned to at most one person and that agencies using ODS will have to adhere to that, even if it means changing a run nomenclature they have used in the past, or (2) seek to be able to represent a wider array of use cases that exist in practice, even if they do not adhere to the typical definition of a run
[Sky brings up the case of having to make post-scheduling adjustments that result in more than one operator on a run. I agree that this case is out of scope for ODS and doesn't need further consideration]
This question should perhaps go out to the full working group
[ ] During scheduling, will a vehicle always be assigned to a full block?
Much the same applies here; we seem to have the same options as to whether we allow or disallow the blocks to have multiple vehicle assignments or vice versa
Hod proposes similar approach as for above
[x] Assignment types for vehicles
Sort of fishing but also a genuine question: might a schedule assign a revenue vehicle to something other than revenue service (be it, idk, standby, relief, or what have you)? Could that or should that be able to be represented as a block?
Hod says he doesn't think this is necessary; I'm inclined to agree at this stage unless we receive other feedback
[x] Block with zero vehicles assigned
The important point here is whether we end up in a situation where we have a complete list of blocks in the scheduling system, but only a subset of those blocks is captured in ODS, leaving the consumer app with an incomplete list or description of blocks
Is there a case where this would happen? How would agencies handle this if they were scheduling during an operator shortage, for example? Is it possible that a defined block would not be assigned a vehicle until, perhaps, day-of (and thus not in ODS)?
[x] Personnel link to bids instead of to jobs?
Sky argues against this as it would obfuscate the information that the consumer actually cares about: the specific work being performed by the operator
this seems like a convincing argument to me, unless we receive other feedback

skyqrose commented 11 months ago

Going back to comment on vehicles:

On MBTA Green Line, vehicles are assigned day of. I just wouldn't use vehicle_assignment.txt, or would leave Green Line blocks out of the file. Consumers should be able to handle blocks without scheduled vehicles (and also runs without assigned employees, which also concretely happens in our data).
Having an inventory of all vehicles would be useful. Caveat: For trains that are made of multiple cars, what I really want is an inventory of cars, which doesn't directly correspond to a vehicle in the realtime data which is a full train. (This doesn't apply to bus.)

safrazier17 commented 11 months ago

Ok, that makes sense. As it stands in GTFS and ODS, blocks are defined in trips.block_id and/or deadheads.block_id, so even if a block doesn't have a assignment in vehicle_assignments.txt, we still can have a full definition of what that block's contents are. I'll mark Block with zero vehicles assigned above as resolved

timon-k commented 11 months ago

For the train use case, wouldn't we then also need multiple vehicles assigned per block (analogously to the multiple employees per run)?

It also could include a new (optional) field order_in_driving_direction so that the order of cars becomes clear if the provinding system knows it.

We can also ignore this for now, but I think the specification needs to make some statement on assignments of multiple vehicles to a single block, because technically, users will be able to do that in the proposed format. We should clarify expectations here (explicitly allow or disallow).

skyqrose commented 11 months ago

As mentioned in another issue, driver_assignments.txt is likely to end up containing more than drivers. staff_assignments.txt would be more generic.

skyqrose commented 11 months ago

If we want to allow multiple operators on the same run / multiple vehicles on the same block, maybe it's as simple as not enforcing any uniqueness guarantees on these files?

The order_in_driving_direction couldn't apply in this file, because a operator/vehicle is likely to be in a different order on different trips (like if a train turns around and the first car becomes the last). But maybe it could work in runs.txt?

timon-k commented 6 months ago

@safrazier17 @skyqrose This issue is currently marked as "Included" in ODS 2.0 here, but I don't see a final proposal of how to add the new files.

Also, the discussion about driver allocations was split off from this issue above, but there it states that the driver assignments should be in the new run_events.txt file - I cannot spot any reference to drivers or staff there. So it seems to me, that we still need to discuss both driver and vehicle assignments as part of this issue?

skyqrose commented 3 months ago

It's been a while, but this issue has come under discussion again for the rostering working group for TODS 2.1 this quarter. Lots of things have happened in this thread and in the rest of TODS, so here's a recap of where we are, with some new ideas and personal opinions mixed in:

Here's what I think this proposal is:

These two files record which vehicle/person is scheduled to do each block/run. They don't have to cover adjustments made after the schedule is made. They don't describe what's included in those blocks/runs (that's already in trips.txt and run_events.txt).

vehicle_assignments.txt:

Column Name	Required?	Description
date	Required
service_id	Optional	Corresponds to the `service_id` that the block is on in `trips.txt`. Recommended if `block_id`s are repeated between different `service_id`s. (Edit: This column added after discussion.)
block_id	Required	Corresponds to `trips.txt:block_id`
vehicle	Required	Might refer to a vehicle, a train, or a type of vehicle, depending on what happens with the vehicle-related proposals this quarter? Maybe we'd have separate `vehicle_type` and `vehicle_id` fields, where you must fill at least one of the columns? We'll need to sort out the details but how to refer to vehicles isn't the hard part of this proposal.

Uniqueness:

A vehicle can do more than one block on a day.
A vehicle type can definitely do more than one block on a day.
As mentioned by timon-k, we need to make some statement about whether to allow multiple vehicles on the same block+date. I lean towards not recommended but allowed (potential use cases: trains with multiple cars; vehicles are scheduled to tag-team a block; or a block could be run with either type of vehicle), and consumers can ignore blocks with multiple assignment if they don't understand it.
Not every block+date combo mentioned in GTFS needs to appear in this file. If it's missing, then we just don't know what vehicle is doing that block, same as if this file doesn't exist at all.

employee_assignments.txt

Column Name	Required?	Description
date	Required
service_id	Optional	In `run_events.txt`, the unique ID for a run is `(service_id, run_id)`, so this is recommended to make it clear which run the row refers to. It's may not be needed, because you might be able to look up a service_id via the calendar, or your run_ids might be unique even between days. But including it could help prevent errors. Probably needs more discussion.
run_id	Required	refers to `run_events.txt`. (Will require some discussion to resolve how this interacts with #76)
employee_id	Optional	Who's doing this run on this day? (If blank, then nobody's scheduled to do it.)

Not included: The type of job being performed is described in run_events.txt, not this file. There was also discussion of order_in_driving_direction, but that could change throughout a run, and if we need that, it'd be better to add it to run_events.

Uniqueness:

A person could do multiple runs on the same date (a double shift?) It's probably rare and not recommended, but it makes perfect sense.
A run probably can't be scheduled to be done by multiple people on the same date. (here's my explanation for that, in the middle section) See more discussion on this point below.
If a run is in run_events.txt but not this file, then we just don't know which employee is doing it.

There was some discussion (comment by hodb) about "cutting" a run between different employees. If this was about scheduling an employee to a run on some dates but not every date, then that's already handled because this file would assign employees on one date at a time. If this was about scheduling multiple people to the same run on the same date, then what I've written above wouldn't work, and it'd need to change. @hodb, which case were you describing?

~~If we do need it, here are some other ideas for how to handle scheduling an employee to only part of a run on a day:~~

~~Include an optional piece_id column. If it's filled, then the employee is only assigned to the portion of the run on that piece (as described by the piece_id column in run_events.txt).~~
For more granularity, include an optional event_sequence field. If it's filled, the employee is only assigned to the one trip/event on that run. (An employee assigned to half a run would probably have many rows in this file to assign them to each of the events they do that day.) This would be the most granular but also the most verbose and complex.
~~Allow assigning multiple people to the same run, without specifying which portion of the run they're doing. This would be easiest, but not describe exactly how the run is split between them.~~

Edit: This was all about realtime adjustments to the roster, not the schedule, so is out of scope.

Comparison to #45

45 is another proposal about assigning people to runs. There are two main differences:

First, instead of employee ids, it gives a roster ids. The roster groups together runs that one person would do across multiple days, but doesn't say which person it's assigned to.

In the working group, we'll need to decide whether we want to include specific employees (as in this issue), the bid/roster (as in 45), or both. I lean towards employees being the right level of abstraction to include (I justify that here, in the 3rd section), but it depends on what uses people have for TODS.

Second, instead of working date-by-date, #45 works week-by-week, giving a monday schedule, a tuesday schedule, etc. This is similar to the difference between calendar.txt and calendard_dates.txt. We could work with either approach alone, or both together.

This file would require a lot of rows (one per run per day), and doing it week by week would be more compact. But this file also makes it easier to deal with vacations and irregular schedules, which I think is valuable given that employee's schedules are less likely to be regular than the service schedules in public GTFS.

We don't have to do only #28 or only #45. We could do both, or recombine the two in new ways, according to what information people want represented in TODS.

Summary

In summary, this issue is moving again, but there are still some major decisions to be made about which information and use cases are important. We'll discuss some of it in the working group meetings, but a lot of things are a lot easier to bring up in writing, so leave comments!

timon-k commented 3 months ago

@skyqrose Thanks for wrapping up the current state so thoroughly. The draft in this ticket looks good to me already, I just wanted to leave feedback on selected points:

I'm strongly in favor of distinguishing between vehicle_type and actual vehicle in vehicle_assignments.txt. It sounds most useful to allow both columns and require at least one of them to be filled.
I'm fine with ignoring order_in_driving_direction for now, but I don't think we can use the runs to model it. Imagine a train consisting of three coupled cars who never change their order during the day (the train stays coupled). If this is published as three vehicles being assigned to a single block, operated by a single run, then the only place to document the order of the cars would be in vehicle_assignments.txt.
Strongly agree with you that we should keep the tuple of (service_id, run_id) as the run key in employee_assignments.txt
I also agree with you that there should be no need to assign a single run to multiple people on the same date. The run should be split in this case and the runs resulting from this split can be assigned to different people.
https://github.com/cal-itp/operational-data-standard/issues/45 seems orthogonal to our discussion here: It describes rostering information, which as you write is on the boundary between scheduling information and operational information. This should probably happen in different files and be treated separately from the assignments of concrete employees and vehicles.

hodb commented 2 months ago

Thanks for the detailed analysis @skyqrose! the use case I was referring to is the fact that a driver doesn't complete his scheduled run, and another driver would need to take care of it. Dispatchers may deal with unexpected situations during the day that can disrupt a trip or the driver’s day, malfunctioning vehicle, a driver getting sick during the work day, a bus breaking down in the middle of a trip, removing the workload of a driver, and such. We're not talking about multi-assignment but the ability to have modifications to the runs and blocks and some reference. It is fair to say that a duty/run can be "cut" and split between different drivers. In that case, we are creating a subset of the original duty performed by different drivers - For instance, the original duty - 123 was planned for driver A, but ended up being split between A and B. Having the cut means we may need a reference to the run_events.txt file to indicate this duty is done by different drivers

skyqrose commented 2 months ago

Ah, that makes sense, and it'd be nice to have a spec for that, but TODS so far has just been scheduled data, and I think the scope for now should just be to represent the runs as they're assigned in the schedule. In that case, then it seems like runs would never be scheduled to be split between multiple people.

Realtime changes to the schedule would then have to be in a future TODS-Realtime specification, similar to how GTFS-RT is a separate spec for realtime changes to the schedule in GTFS.

timon-k commented 2 months ago

Revising to my comment above: Perhaps we should omit service_id from the assignments.

If we would have it, it would make more sense to have it in both files (vehicles and employee assignments) to keep them consistent. If run IDs are not unique on a given date, then block IDs would perhaps also not be unique.

But it also does not make much sense to have multiple runs or blocks on the same date with the same ID. I think we can assume that real-world usage will have unique block and run IDs on every single date.

skyqrose commented 2 months ago

Two reasons to include service_id, given that runs are not unique on different services, even if they're unique within all services on the same date:

It makes it easier to look up the run in run_events.txt. If the service isn't listed here, you have to look up in the calendar which services are active, and then find an entry in run_events.txt that matches the run_id and also happens on any one of those services.
It could help prevent errors from mixing up runs on different days with the same id. I've had bugs in the past due to calendar exceptions not being applied correctly (holidays, track work, storm schedules), which led to looking at the wrong schedule (Weekday run 100 instead of Holiday run 100), and nonsense data. Always referring to a run as it's full (service_id, run_id) pair prevents those bugs.

And I think all this applies to blocks and vehicle_assignments as well.

timon-k commented 2 months ago

Makes sense, but then let's include the service_id in both assignment files?

cal-itp / operational-data-standard

Allocating drivers and vehicles in ODS #28

Here's what I think this proposal is:

Comparison to #45

45 is another proposal about assigning people to runs. There are two main differences:

Summary