BlinkTagInc / node-gtfs

Import GTFS transit data into SQLite and query routes, stops, times, fares and more.
MIT License
430 stars 148 forks source link

Propagate RT delays to missing stop_sequences #162

Open nukeador opened 1 month ago

nukeador commented 1 month ago

Following from https://github.com/BlinkTagInc/node-gtfs/issues/160#issuecomment-2122385331

The GTFS spec states that:

If one or more stops are missing along the trip the delay from the update (or, if only time is provided in the update, a delay computed by comparing the time against the GTFS schedule time) is propagated to all subsequent stops. This means that updating a stop time for a certain stop will change all subsequent stops in the absence of any other information. Note that updates with a schedule relationship of SKIPPED will not stop delay propagation, but updates with schedule relationships of SCHEDULED (also the default value if schedule relationship is not provided) or NO_DATA will.

This is a feature request so node-gtfs is able to propagate this delay when previous stops do have RT time data.

Some of us are encountering agencies that only provide RT trip updates for a few stops on a trip and not all of them, this results on missing RT times for many stops along a trip.

Other apps such as OTP, provide a config setting to granularity control this behavior.

brendannee commented 1 month ago

Thanks for the details on this.

Right now, the node-gtfs library just stores exactly what was received to the database and doesn't try to modify it in any way. It's up to the application querying the database to handle the data.

I'm thinking it might be useful to build this functionality (and a few other features, like exposing a JSON API) as part of a new library built on top of node-gtfs.

nukeador commented 1 month ago

I see, are there any best practices or examples of other libraries built on top for reference? Thanks!

brendannee commented 1 month ago

Great question.

I haven't used the GTFS-Realtime functionality of the library extensively, which is why there are not a lot of features built around it (I mostly use the static GTFS functionality in lots of other apps). Often, when I need an app to query GTFS-Realtime data, I just do that directly and not store the data in sqlite via node-gtfs.

I don't know any other projects built on top of node-gtfs that use the GTFS-Realtime functionality, but I'd love to have some point to.

nukeador commented 1 month ago

We are building an app that uses node-gtfs and returns a RESTful API, with support for multiple agencies and realtime updates. Basically it's a tailor solution to serve data to a client-side web app (PWA)

Probably in the future we should standardize the data structure and field naming of the output json so it can be reused by others.

https://github.com/VallaBus/api-auvasa (sorry docs and comments are in Spanish)

nukeador commented 1 month ago

As far as I've found, delay propagation is part of the GTFS spec expected behavior

If one or more stops are missing along the trip the delay from the update (or, if only time is provided in the update, a delay computed by comparing the time against the GTFS schedule time) is propagated to all subsequent stops. This means that updating a stop time for a certain stop will change all subsequent stops in the absence of any other information. Note that updates with a schedule relationship of SKIPPED will not stop delay propagation, but updates with schedule relationships of SCHEDULED (also the default value if schedule relationship is not provided) or NO_DATA will. Example

For the same trip instance, three StopTimeUpdates are provided:

  • delay of 300 seconds for stop_sequence 3
  • delay of 60 seconds for stop_sequence 8
  • ScheduleRelationship of NO_DATA for stop_sequence 10

This will be interpreted as:

  • stop_sequences 1,2 have unknown delay.
  • stop_sequences 3,4,5,6,7 have delay of 300 seconds.
  • stop_sequences 8,9 have delay of 60 seconds.
  • stop_sequences 10,..,20 have unknown delay

So I guess that's what you would expect node-gtfs to do by default when you request getStopTimeUpdates()

nukeador commented 1 month ago

As a temporal workaround I've created a function to calculate the delayed propagation for a given trip_id+stop_sequence that doesn't have realtime data, so I can query and apply the delay to the scheduled time.

I've done this both for forward but also for backward propagation, to cover numerous trips that arrive before schedule and end up showing only the scheduled arrival.

It's not perfect, but it's the most accurate way to show timetables following the GTFS spec recommendation. Ideally node-gtfs can implement something in the future so it's a built-in configurable feature.