CUTR-at-USF / gtfs-realtime-validator

Java-based tool that validates General Transit Feed Specification (GTFS)-realtime feeds. See https://github.com/MobilityData/gtfs-realtime-validator for the latest!
Other
92 stars 40 forks source link

Check that the delay field is consistent with difference between the scheduled and predicted times #41

Open barbeau opened 7 years ago

barbeau commented 7 years ago

If not, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this applies to stop_time_update.arrival/departure.delay, as well as trip_update.delay. I noticed that SDMTS is providing stop_time_update.departure.time, as well as trip_update.delay:

"entity": [
{
  "id": "1",
  "trip_update": {
    "trip": {
      "trip_id": "12341185",
      "route_id": "30"
    },
    "stop_time_update": [
      {
        "departure": {
          "time": 1498664460
        },
        "stop_id": "95034"
      }
    ],
    "vehicle": {
      "id": "911"
    },
    "timestamp": 1498664286,
    "delay": -120
  }
},
nselikoff commented 6 years ago

@barbeau FYI Google's QA will currently throw a warning if both trip_update.delay and stop_time_update.arrival/departure.timeare provided, and trip_update.delay takes precedence:

INVALID_TRIP_UPDATE_DELAY_USAGE image

barbeau commented 6 years ago

@nselikoff Interesting, thanks for sharing. Looks like Google isn't following the spec on this one - GTFS-realtime StopTimeEvent docs say (emphasis mine):

Timing information for a single predicted event (either arrival or departure). Timing consists of delay and/or estimated time, and uncertainty.

  • delay should be used when the prediction is given relative to some existing schedule in GTFS.
  • time should be given whether there is a predicted schedule or not. If both time and delay are specified, time will take precedence (although normally, time, if given for a scheduled trip, should be equal to scheduled time in GTFS + delay).

IMHO the GTFS-realtime documented approach is better, because producers don't have to populate arrival and departure times for all GTFS stop_times.txt stops. In those causes without scheduled arrivals and departures, if you want to show an estimated arrival clock time to the end user, delay is meaningless because you don't have a scheduled time to calculate against. So if you're considering all cases it's more important that time is correct than the delay.

nselikoff commented 6 years ago

@barbeau I agree, StopTimeEvent time is preferable to StopTimeEvent delay.

In this case Google's warning is about TripUpdate delay, which is still listed as experimental in the spec (GTFS-realtime TripUpdate).

Given a producer that can provide any or all of TripUpdate delay, StopTimeEvent time, and StopTimeEvent delay, it seems like it's best to only provide StopTimeEvent time to give the most relevant data and avoid ambiguity. Does that sound right?

barbeau commented 6 years ago

In this case Google's warning is about TripUpdate delay, which is still listed as experimental in the spec (GTFS-realtime TripUpdate).

Oh, ok.

Given a producer that can provide any or all of TripUpdate delay, StopTimeEvent time, and StopTimeEvent delay, it seems like it's best to only provide StopTimeEvent time to give the most relevant data and avoid ambiguity. Does that sound right?

I think you'll get different opinions on this depending on who you talk to, and I'd suggest reaching out to the GTFS-realtime Google Group to ask. It's a really good topic for the future GTFS-realtime Best Practices (and I just opened another issue for it here - https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/329).

IMHO - TripUpdate.delay should only be populated if you don't have stop-level predictions, meaning:

  1. You only have a single delay value for the entire trip (i.e., one delay per vehicle) AND
  2. You don't have the current position of the vehicle along the route (i.e., it's current stop_id or stop_sequence)

If you have stop level predictions, then provide one or more StopTimeEvents and not TripUpdate.delay.

My personal recommendation is that you provide delay AND time in StopTimeEvents, as it serves as another integrity check to make sure your arrival/departure info is coherent. If they're different, then there is an error in the producer's feed (which is what this issue's proposed validation rule is for). This is most helpful, though, if the agency is willing to do something about the bad data. Otherwise, it's confusing for consumers and you don't know which value to trust. You could only provide delay OR time to reduce ambiguity, but in the case of bad data I don't think that's any better for the end user - there's no ambiguity for the consumer but then you just pass bad data on to the end user (as opposed to be able to flag a delay/time mismatch as a consumer and not pass it along to the user).

I'd be curious what others say though.