add rosters.txt - Githubissues

BTollison commented 9 months ago

In an effort to help distribute information between platforms, we wish to add rosters.txt. The intended use is that many operations cluster several duties (runs) within a roster to assign to a driver. The duties are usually clustered in a way that represents the amount of work a single driver will work in a week period. Some companies also cluster this work in several weeks for 1 drivers.

ods_roster_id Persistent ID as proposed in issue #44 meant to help keep track of changes to a roster over time. Rosters can have long lifespans while the duties (runs) can change within them over time.

Required

roster_id

Required, primary key

service_id

Required, defined service period that the duties and roster_id is valid for.

monday_run

Required, value should be a run_id that operates on a monday during this service period.
If blank, no run_id is assigned.

tuesday_run

Required, value should be a run_id that operates on a tuesday during this service period.
If blank, no run_id is assigned.

wednesday_run

Required, value should be a run_id that operates on a wednesday during this service period.
If blank, no run_id is assigned.

thursday_run

Required, value should be a run_id that operates on a thursdsay during this service period.
If blank, no run_id is assigned.

friday_run

Required, value should be a run_id that operates on a friday during this service period.
If blank, no run_id is assigned.

saturday_run

Required, value should be a run_id that operates on a saturday during this service period.
If blank, no run_id is assigned.

sunday_run

Required, value should be a run_id that operates on a saturday during this service period.
If blank, no run_id is assigned.

week_sequence

if blank, then the roster represents only 1 week.
This value represents what what week this row of the roster should represent. If a roster_id has 2 weeks worth of work, then there should be a value 1 in the first week and a value 2 in the second week.

skyqrose commented 9 months ago

I can see how a grouping of many days' work is useful to represent even when there's no specific person assigned to it yet, as part of the scheduling process. So far, ODS has been mostly focused on data useful during operation, as opposed to during planning+scheduling. Would this be useful during operation (like if you need to check if the usual driver is working today), or is this intended to expand the scope of ODS into scheduling?

Would it be a specific week? I don't see a date field. There's a service_id field, but service_ids can be a many days or only 1, consecutive or not. What does the sunday_run field mean if applied to a service_id that only occurs on weekdays? How would you represent how holiday work is grouped with other work?

If it is meant to represent specific dates (as opposed to a generic week), it kind of turns into #28, with 1/7th the rows but 7x the columns. What if you leaned into that, and had a table with 3 columns, operational_date, run_id, roster_id?

BTollison commented 9 months ago

This is meant to take the scheduling data (Hastus, Optibus, etc) and send it to the operational side. Although I can see how on the scheduling side you could also assign drivers to every roster_id. In my previous work in the USA it was common to assign a driver to a roster during a bidding process. So, I am open to including this information somehow in a rosters.txt.

Since my intention here was to only model what comes from scheduling and not what occurs on the operational side (yet) I had not intended this to be tied to a specific week so much as tied to a specific service period (service_id). You raise a good point though, if a roster_id and service_id are overlapping you'll get duplicate roster_id's:

Calemdar.txt | | | | | | | | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- service_id | monday | tuesday | wednesday | thursday | friday | saturday | sunday | start_date | end_date weekday | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 20240107 | 20240127 saturday | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 20240107 | 20240127 sunday | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 20240107 | 20240127 roster.txt | | | | | | | | | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- ods_roster_id | roster_id | service_id | monday_run | tuesday_run | wednesday_run | thursday_run | friday_run | saturday_run | sunday_run | week_sequence 1 | 1 | weekday | 4 | 5 | 6 | | | 3 | 2 | 2 | 1 | saturday | 4 | 5 | 6 | | | 3 | 2 | 3 | 1 | sunday | 4 | 5 | 6 | | | 3 | 2 | So this is not what I would consider ideal, but in some ways it makes sense because the duties are always grouped by the work per service day type, so unless we found a way to group service_id, at the moment I can't think of a less bad way to do this. On your question about holiday work, maybe the workaround here can be similar to what happens with calendar_dates.txt where we simply have the roster_id + the duty that applies that day? This sounds like maybe what you are suggesting with issue #28? There are places where a person may be doing a longer than 1 week roster, so that's why I have the field of week_sequence.

skyqrose commented 9 months ago

Thanks for the tables, that helps show the idea.

This is what I meant by the 3 columns in #28. There wouldn't be a separate exceptions file like calendar and calendar_dates, just a single file with all the dates. It's more duplicated rows cuz you need one for every date on the roster, but more straightforward, easier to have exceptions like holidays, and dodges some of the confusion about whether a service is active on a day of the week. I think it can represent everything that the day-of-the-week system can (as long as you want to say someone works on these 3 specific Mondays, and don't need to say someone typically works on Mondays in general.)

roster.txt
date	roster_id	run_id
20240107	1	2
20240108	1	4
20240109	1	5
20240110	1	6
20240113	1	3
20240114	1	2
20240115	1	4
20240116	1	5
20240117	1	6
20240120	1	3
20240121	1	2
20240122	1	2 (MLK day)
20240123	1	5
20240124	1	6
20240127	1	3
20240107	2	...

BTollison commented 9 months ago

Okay, I see what you're getting at now. My only concern with this is how many repeated rows we'd have.

I guess then for this to work, you need a way to distinguish which run_id is valid since there could be multiple run_id's with the same name, right?

skyqrose commented 9 months ago

Oh, yeah, my agency has issues with non-unique run ids, too, that's described in #12. Potential solutions for here:

Use unique run_ids
Add a 4th column for the service_id
Assume run_ids are unique among the service_ids effective on the given date. To get the service_id, look up what service_ids are operating on that date, and find one with the given run. (Edit: I'm pretty sure this isn't a safe assumption, cuz there could be run_ids that are the same on bus and subway running on the same day.)

I'm gonna go make sure #28 doesn't run into this problem, too, thanks for noticing that.

westontrillium commented 8 months ago

I would strongly advise against allowing duplicate primary keys. Even if their existence is commonplace, it goes against some fundamental relational database best practices.

I do agree the 3-column option makes it easier to be more explicit about dates of a given roster and requires less calculation/interpretation, though you'd probably want a second table to normalize rosters.

BTollison commented 1 week ago

So I'm currently attempting to revise rosters.txtproposal, but it occurs to me that in the calendar.txt style, we run into the problem of:

How do we handle a roster with service_id's? I end up in a situation where I need 3 rows to describe 1 week of rosters in a "simple" scenario where we have a weekday, saturday, and sunday service_id. We need a way to model this nicely to feed rosters in systems, the calendar_dates.txt style only works in an operational context I think.

The best idea so far that I have is a file just to group service_id's that are relevant...

grouped_calendar.txt

service_id	group_service_id
weekday	9999
saturday	9999
sunday	9999

rosters.txt

group_service_id	roster_id	week_sequence	monday_run	tuesday_run	wednesday_run	thursday_run	friday_run	saturday_run	sunday_run
9999	1	1	10	20	30	40	50
9999	1	2	80	90	100	110			70
9999	1	3	120	130	140			60	150

roster_dates.txt

grouped_service_id	date	roster_id	run_id	exception_type
9999	20240830	1	555	1

For roster_dates.txt exception_type would have the proposed values:

exception_type	Definition
NULL	no rosters.txt, and is therefore the default
1	replacement
2	removed

cal-itp / operational-data-standard

add rosters.txt #45