MobilityData / gtfs-realtime-validator

Java-based tool that validates General Transit Feed Specification (GTFS)-realtime feeds
Other
42 stars 10 forks source link

New Rule Proposal: Individual scheduled trip_id not accounted for in Trip Updates feed #157

Open evansiroky opened 2 years ago

evansiroky commented 2 years ago

Summary:

An error should be raised whenever an individual Trip that should be in service at the time a Trip Update feed was downloaded is not accounted for in any TripUpdate record in the Trip Update feed.

Steps to reproduce:

Given a TripUpdate dataset and its associated GTFS Schedule dataset When when the validator has compiled a list of all trips that should be currently in service and has scanned through all TripUpdate entities in a Trip Updates feed and does not find an individual Trip that was expected to be in service being accounted for in any TripUpdate record Then the validator should flag this respective trip_id in question for not having any corresponding TripUpdate entity describing its realtime status in the Trip Update feed.

Expected behavior:

The GTFS-Realtime Best Practices state:

Feeds should cover the vast majority of trips included in the companion static GTFS dataset. In particular, it should include data for high-density and high-traffic city areas and busy routes.

The GTFS Validator should flag all trip_ids individually that should have been accounted for at the time that the trip should have been in service.

Observed behavior:

An error or warning is not raised for this problem at this time.

etc

This issue seeks to add more detailed scope to https://github.com/MobilityData/gtfs-realtime-validator/issues/119.

briandonahue commented 1 year ago

Is there an established method for determining if a trip should be currently in service? If no, would the correct approach be to see if the current time is between the first and last stop times for a trip?

evansiroky commented 1 year ago

would the correct approach be to see if the current time is between the first and last stop times for a trip?

Yes, this is what should be done, but the calendar files also need to be taken into account to make sure that the trip is supposed to be operating on the given day in question.

briandonahue commented 1 year ago

In reviewing this with @bdferris-v2, it seems it could be a fairly complicated implementation. Currently there is no easy way to find a list of trips that should be active from the static feed data. We would have to add code to build a more structured list of trips by service date, start time, end time, (and account for trips that span days/midnight) that could be more easily indexed for a real-time validation. Additionally he mentioned complications like the way Daylight Savings Time is handled in GTFS data (didn’t get into details here) as well as trips in blocks that can have cascading delays (a late trip can make the next trip in the block late).

This seems like it could be a significant effort that maybe should be broken into smaller chunks? Open to suggestions!