cal-itp / operational-data-standard

The Transit Operational Data Standard is an open standard for representing the transit schedules used by drivers, dispatchers, and planners to carry out transit operations.
https://ods.calitp.org
Apache License 2.0
26 stars 6 forks source link

Separating crew runs from public-facing trips using distinct service_ids #76

Open jeffkessler-keolis opened 2 months ago

jeffkessler-keolis commented 2 months ago

tl;dr This non-breaking change would allow producers to have separate sets of runs that map to the same trips — as is standard in most schedule systems — by adding an optional trip_service_id field to run_events, and allowing the assignment of run-only service_ids on dates using calendar and calendar_dates supplement files.

Context

TODS v2 represents a major step forward in the standard with respect to the modeling of crew runs. The overhauled run_events.txt file likewise provides enhancements for the modeling of runs alongside trips that may already exist in GTFS.

Problem

Since run_events.txt are mapped to existing GTFS trips via service_ids, the two are paired in a 1:1 relationship of runs and trips. This presents two challenges for producers seeking to model crew runs in run_events.txt.

  1. Many scheduling systems (e.g. Hastus) decouple "vehicle schedules" and "crew schedules," representation of which in a single shared service_id limits the ability to distinguish between the two and create a 1:many mapping of trips and runs.

    • For example, while a set of trips may remain unchanged, the underlying runs could be modified themselves (generally around trackwork or holidays).

      • Consider a set of trips with service_id spring24-schedule1. While there may be a standard set of crew runs mapped to the trips on that day, some crew runs may not operate on the day before/after a holiday, and other runs may be modified to accommodate the lower staffing, even through all trips will operate normally.

      • e.g. For a US railroad, this Friday (7/5/24) will likely be very light ridership with individuals taking off after the Independence Day [Federal/≈"Bank"] Holiday, despite operating a standard weekday schedule. As a result, additional Assistant Conductors to support higher-ridership trains may not be required on Friday, meaning some CREW runs will not operate, and others will be modified, despite operating the same public schedule of TRIPS (which would continue to use the standard weekday schedule's service_id in the base GTFS).

  2. Crew schedules could have different runs depending on the day of the week, yet be stored in the same schedule (e.g. special Friday service).

    • Many scheduling systems allow for deviations in runs within a singular schedule, an example of which might be the inclusion of certain runs and trips that operate only on a particular day of the week within a larger set of service.

    • e.g. The seasonal CapeFLYER service operates only on Fridays, with some modified crew runs to accommodate the additional service on Fridays within a singular Hastus crew schedule.

Potential Mitigations

Create a unique service_id for every applicable combination of trips and runs (vehicle and crew schedule)

While one could create a concatenated service_id representing the combination of vehicle and crew schedules (e.g. service_id spring24-schedule1-crew1 and spring24-schedule1-crew2, doing so is a inadvisable for two reasons:

  1. This approach requires producers to change the way in which they produce their public GTFS, which is something the working group expressed we wanted to generally avoid; PLUS,

  2. This approach requires duplication of otherwise identical trip data for each service_id, alongside modification to the underlying primary keys (e.g. trip_ids of trip1 and trip1-crew2 to ensure uniqueness to each service_id), thereby (a) adding considerably to the size of the underlying file, (b) adding extensive onus to producers, and (c) potentially also having downstream impacts on customer-facing applications that group service by service_id.

Proposed Solution

Adding a new, optional trip_service_id field to run_events

To combat these issues, ensure backwards-compatibility, and more easily support export from existing scheduling systems, the addition of an optional trip_service_id field is proposed.

Supporting additional service_id definitions via calendar_supplement.txt and calendar_dates_supplement.txt

Introducing new service_id entries means a mechanism would need to be added for the assignment of these entries to particular dates. Fortunately, the newly-introduced supplement paradigm allows for an updating of the calendar.txt and calendar_dates.txt files via the addition of entries in applicable TODS supplement files.

These files would permit the assignment of any new run-specific service_ids on both specific dates and in a given date range via the existing data standard of the calendar.txt and calendar_dates.txt files.

Approach Limitations

Next Steps

jfabi commented 1 month ago

@jeffkessler-keolis Thanks for sharing the concern around the different ways schedules can be implemented as well as the possible solution.

Question: Your proposal is only to add the new optional trip_service_id. There may be multiple possible readings, but mine is that https://github.com/cal-itp/operational-data-standard/pull/66 doesn't prohibit a new service_id value from being added in TODS. The proposed spec does not say that run_events.service_id must come from trips.txt, and doing so would necessarily preclude runs containing only non-trip/deadhead events (re https://github.com/cal-itp/operational-data-standard/issues/11). Would it be worth adding some clarification to https://github.com/cal-itp/operational-data-standard/pull/66 to note how run_events.service_id can be utilized?

skyqrose commented 1 month ago

Okay, I think I understand how this is an issue, and how this suggestion fixes it. Responses to specific parts:

  1. Problem:
    1. CapeFLYER:
      1. To clarify, the way this is represented in GTFS is normal-service M-F and additional-cape-flyer-service on Fridays only. Then for crew, you're proposing normal-run-service on M-Thur, and entire-friday-run-service on Friday? And it wouldn't work to assign runs to only the additional-cape-flyer-service because some employees work part of a day on the CapeFLYER and part of their day on other routes?
  2. Potential Mitigations:
    1. Yeah this seems like not a good option.
    2. Are you (Keolis) blocked from producing run_events.txt until this is done?
  3. Proposed Solution:
    1. new trip_service_id:
      1. Is it 100% backwards compatible, or would consumers have to change to avoid misinterpereting anything if they unexpectedly get a TODS file that uses this approach?
      2. How would it compare to do a new run_service_id instead, and then the existing service_id field refers to the same trip service id as in GTFS? It's probably not better, but I just want to make sure the possibility is considered. run services and trip services are distinct categories, so it'd be nice if service_id always referred to the same thing, but run_events.txt should probably have run services as its primary key instead of trip services.
  4. Limitations:
    1. There's quite a bit of complexity in the requirements here. All of it makes sense, but writing it down in a spec that's easy to understand and easy to translate into validation scripts will be a challenge.
    2. If it's not too much of a burden, I think writing a draft spec-quality description for the service_id and trip_service_id columns would help make the issue easier to discuss and could uncover more problems to fix.

I think it'd be useful to see some example data (run_events.txt, calendar.txt, and calendar_supplement.txt), probably modeled after the CapeFLYER case.

skyqrose commented 1 month ago

And responding to Josh:

66 without this proposal couldn't use newly-defined service_ids because without calendar_supplement.txt those services wouldn't happen on any dates. Once calendar_supplement.txt exists, yes, you should be able to define new deadhead-only or event-only services. (And any PR to implement this proposal should make that clear in the documentation for calendar_supplement.txt)

jeffkessler-keolis commented 2 weeks ago

@skyqrose I detailed a number of different examples in #80, but to directly answer your questions:

  1. That's one example, but yes, the CapeFLYER employees work in both additional-cape-flyer-service realm AND in normal-service. (I realize now I did not provide an example matching this specific scenario, although I can add one.)
  2. Yes.
  3. It is backwards compatible in the sense that it fully supports any previously-existing file, but could be viewed as a breaking change in that a run_events.txt file could fail a primary key validation in having two entries with the same PK if a consumer is not reading the additional trip_service_id field… but the underlying run data would be garbage, anyway, if that field were provided but not interpreted, so the non-silent failure would be desired.
  4. Drafted in #80
  5. (Per @jfabi's question and @skyqrose's answer) I hadn't even thought of this, but it's an excellent point and one I've added to the documentation.