cal-itp / operational-data-standard

The Transit Operational Data Standard is an open standard for representing the transit schedules used by drivers, dispatchers, and planners to carry out transit operations.
https://ods.calitp.org
Apache License 2.0
26 stars 6 forks source link

add rosters.txt #45

Open BTollison opened 9 months ago

BTollison commented 9 months ago

In an effort to help distribute information between platforms, we wish to add rosters.txt. The intended use is that many operations cluster several duties (runs) within a roster to assign to a driver. The duties are usually clustered in a way that represents the amount of work a single driver will work in a week period. Some companies also cluster this work in several weeks for 1 drivers.

ods_roster_id Persistent ID as proposed in issue #44 meant to help keep track of changes to a roster over time. Rosters can have long lifespans while the duties (runs) can change within them over time.

roster_id

service_id

monday_run

tuesday_run

wednesday_run

thursday_run

friday_run

saturday_run

sunday_run

week_sequence

skyqrose commented 9 months ago

I can see how a grouping of many days' work is useful to represent even when there's no specific person assigned to it yet, as part of the scheduling process. So far, ODS has been mostly focused on data useful during operation, as opposed to during planning+scheduling. Would this be useful during operation (like if you need to check if the usual driver is working today), or is this intended to expand the scope of ODS into scheduling?

Would it be a specific week? I don't see a date field. There's a service_id field, but service_ids can be a many days or only 1, consecutive or not. What does the sunday_run field mean if applied to a service_id that only occurs on weekdays? How would you represent how holiday work is grouped with other work?

If it is meant to represent specific dates (as opposed to a generic week), it kind of turns into #28, with 1/7th the rows but 7x the columns. What if you leaned into that, and had a table with 3 columns, operational_date, run_id, roster_id?

BTollison commented 9 months ago

This is meant to take the scheduling data (Hastus, Optibus, etc) and send it to the operational side. Although I can see how on the scheduling side you could also assign drivers to every roster_id. In my previous work in the USA it was common to assign a driver to a roster during a bidding process. So, I am open to including this information somehow in a rosters.txt.

Since my intention here was to only model what comes from scheduling and not what occurs on the operational side (yet) I had not intended this to be tied to a specific week so much as tied to a specific service period (service_id). You raise a good point though, if a roster_id and service_id are overlapping you'll get duplicate roster_id's:

Calemdar.txt |   |   |   |   |   |   |   |   |   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- service_id | monday | tuesday | wednesday | thursday | friday | saturday | sunday | start_date | end_date weekday | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 20240107 | 20240127 saturday | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 20240107 | 20240127 sunday | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 20240107 | 20240127 roster.txt |   |   |   |   |   |   |   |   |   |   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- ods_roster_id | roster_id | service_id | monday_run | tuesday_run | wednesday_run | thursday_run | friday_run | saturday_run | sunday_run | week_sequence 1 | 1 | weekday | 4 | 5 | 6 |   |   | 3 | 2 |   2 | 1 | saturday | 4 | 5 | 6 |   |   | 3 | 2 |   3 | 1 | sunday | 4 | 5 | 6 |   |   | 3 | 2 |   So this is not what I would consider ideal, but in some ways it makes sense because the duties are always grouped by the work per service day type, so unless we found a way to group service_id, at the moment I can't think of a less bad way to do this. On your question about holiday work, maybe the workaround here can be similar to what happens with calendar_dates.txt where we simply have the roster_id + the duty that applies that day? This sounds like maybe what you are suggesting with issue #28? There are places where a person may be doing a longer than 1 week roster, so that's why I have the field of week_sequence.
skyqrose commented 9 months ago

Thanks for the tables, that helps show the idea.

This is what I meant by the 3 columns in #28. There wouldn't be a separate exceptions file like calendar and calendar_dates, just a single file with all the dates. It's more duplicated rows cuz you need one for every date on the roster, but more straightforward, easier to have exceptions like holidays, and dodges some of the confusion about whether a service is active on a day of the week. I think it can represent everything that the day-of-the-week system can (as long as you want to say someone works on these 3 specific Mondays, and don't need to say someone typically works on Mondays in general.)

roster.txt    
date roster_id run_id
20240107 1 2
20240108 1 4
20240109 1 5
20240110 1 6
20240113 1 3
20240114 1 2
20240115 1 4
20240116 1 5
20240117 1 6
20240120 1 3
20240121 1 2
20240122 1 2 (MLK day)
20240123 1 5
20240124 1 6
20240127 1 3
20240107 2 ...
BTollison commented 9 months ago

Okay, I see what you're getting at now. My only concern with this is how many repeated rows we'd have.

I guess then for this to work, you need a way to distinguish which run_id is valid since there could be multiple run_id's with the same name, right?

skyqrose commented 9 months ago

Oh, yeah, my agency has issues with non-unique run ids, too, that's described in #12. Potential solutions for here:

I'm gonna go make sure #28 doesn't run into this problem, too, thanks for noticing that.

westontrillium commented 8 months ago

I would strongly advise against allowing duplicate primary keys. Even if their existence is commonplace, it goes against some fundamental relational database best practices.

I do agree the 3-column option makes it easier to be more explicit about dates of a given roster and requires less calculation/interpretation, though you'd probably want a second table to normalize rosters.

BTollison commented 1 week ago

So I'm currently attempting to revise rosters.txtproposal, but it occurs to me that in the calendar.txt style, we run into the problem of:

How do we handle a roster with service_id's? I end up in a situation where I need 3 rows to describe 1 week of rosters in a "simple" scenario where we have a weekday, saturday, and sunday service_id. We need a way to model this nicely to feed rosters in systems, the calendar_dates.txt style only works in an operational context I think.

The best idea so far that I have is a file just to group service_id's that are relevant...

grouped_calendar.txt

service_id group_service_id
weekday 9999
saturday 9999
sunday 9999

rosters.txt

group_service_id roster_id week_sequence monday_run tuesday_run wednesday_run thursday_run friday_run saturday_run sunday_run
9999 1 1 10 20 30 40 50    
9999 1 2 80 90 100 110     70
9999 1 3 120 130 140     60 150

roster_dates.txt

grouped_service_id date roster_id run_id exception_type
9999 20240830 1 555 1

For roster_dates.txt exception_type would have the proposed values:

exception_type Definition
NULL no rosters.txt, and is therefore the default
1 replacement
2 removed