CUTR-at-USF / bullrunner-gtfs-realtime-generator

Desktop application that retrieves AVL data from the USF Bull Runner's AVL system and produces Trip Updates and Vehicle Positions files in GTFS-realtime format.
Other
2 stars 3 forks source link

Multiple trip updates appearing for same trip instance (loop instance) #8

Open barbeau opened 9 years ago

barbeau commented 9 years ago

@cagryInside found the following when providing the GTFS-rt Trip Updates feed (http://mobullity.forest.usf.edu:8088/trip-updates?debug) to OneBusAway:

Log: INFO  [GtfsRealtimeSource.java:232] : refreshing http://mobullity.forest.usf.edu:8088/trip-updates
WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle 2233; taking newest.
...
Gtfs feed:

entity {

  id: "17"

trip_update {

    trip {

      trip_id: "3"

      start_time: "11:16:58"

      schedule_relationship: UNSCHEDULED

      route_id: "B"

    }

   ...

    vehicle {

      id: "2233"

    }

entity {

  id: "18" trip_update {

    trip {

      trip_id: "3"

      start_time: "11:16:58"

      schedule_relationship: UNSCHEDULED

      route_id: "B"

    }

   ...

    vehicle {

      id: "2233"

    }

The combination of trip_id + start_time + vehicle_id should be unique (i.e., this is one "loop" of the frequency-based route B), so having two records with the exact same values is wrong. If the second trip_update refers to the next instance of the loop, then the start_time should be different (and should be equal to the time that the first prediction for this trip instance became visible from the Syncromatics API). See the GTFS-rt TripDescriptor semantics document for more detail.

@jmfield2 Could you please take a look at this?

barbeau commented 9 years ago

The below data also seems suspicious - technically it's valid because the start_times for each of the trip_ids for the same vehicle are different, but they are only 30 seconds apart. We would normally expect the time difference to be approximately the amount of time it takes the vehicle to run one instance of the loop route:

entity {
  id: "9"
  trip_update {
    trip {
      trip_id: "14"
      start_time: "15:28:30"
      schedule_relationship: UNSCHEDULED
      route_id: "F"
    }
    stop_time_update {
      stop_sequence: 23
      arrival {
        time: 1427484000
      }
      stop_id: "517"
    }
    stop_time_update {
      stop_sequence: 24
      arrival {
        time: 1427484000
      }
      stop_id: "521"
    }
    stop_time_update {
      stop_sequence: 25
      arrival {
        time: 1427484060
      }
      stop_id: "527"
    }
    stop_time_update {
      stop_sequence: 26
      arrival {
        time: 1427484180
      }
      stop_id: "912"
    }
    stop_time_update {
      stop_sequence: 27
      arrival {
        time: 1427484240
      }
      stop_id: "906"
    }
    stop_time_update {
      stop_sequence: 28
      arrival {
        time: 1427484240
      }
      stop_id: "904"
    }
    stop_time_update {
      stop_sequence: 29
      arrival {
        time: 1427484300
      }
      stop_id: "446"
    }
    stop_time_update {
      stop_sequence: 30
      arrival {
        time: 1427484360
      }
      stop_id: "426"
    }
    stop_time_update {
      stop_sequence: 31
      arrival {
        time: 1427484420
      }
      stop_id: "418"
    }
    vehicle {
      id: "1329"
    }
  }
}
entity {
  id: "8"
  trip_update {
    trip {
      trip_id: "14"
      start_time: "15:28:00"
      schedule_relationship: UNSCHEDULED
      route_id: "F"
    }
    stop_time_update {
      stop_sequence: 1
      arrival {
        time: 1427484480
      }
      stop_id: "401"
    }
    stop_time_update {
      stop_sequence: 2
      arrival {
        time: 1427485080
      }
      stop_id: "421"
    }
    stop_time_update {
      stop_sequence: 3
      arrival {
        time: 1427485140
      }
      stop_id: "425"
    }
    stop_time_update {
      stop_sequence: 4
      arrival {
        time: 1427485200
      }
      stop_id: "445"
    }
    stop_time_update {
      stop_sequence: 5
      arrival {
        time: 1427485260
      }
      stop_id: "449"
    }
    stop_time_update {
      stop_sequence: 6
      arrival {
        time: 1427485320
      }
      stop_id: "905"
    }
    stop_time_update {
      stop_sequence: 7
      arrival {
        time: 1427485380
      }
      stop_id: "911"
    }
    stop_time_update {
      stop_sequence: 8
      arrival {
        time: 1427485560
      }
      stop_id: "526"
    }
    stop_time_update {
      stop_sequence: 9
      arrival {
        time: 1427485620
      }
      stop_id: "520"
    }
    stop_time_update {
      stop_sequence: 10
      arrival {
        time: 1427485740
      }
      stop_id: "518"
    }
    stop_time_update {
      stop_sequence: 11
      arrival {
        time: 1427485800
      }
      stop_id: "514"
    }
    stop_time_update {
      stop_sequence: 12
      arrival {
        time: 1427485860
      }
      stop_id: "510"
    }
    stop_time_update {
      stop_sequence: 13
      arrival {
        time: 1427485920
      }
      stop_id: "508"
    }
    stop_time_update {
      stop_sequence: 14
      arrival {
        time: 1427486040
      }
      stop_id: "504"
    }
    stop_time_update {
      stop_sequence: 15
      arrival {
        time: 1427486040
      }
      stop_id: "502"
    }
    vehicle {
      id: "1329"
    }
  }
jmfield2 commented 9 years ago

So, I looked into this issue over the weekend and it seems to be originating from the way the generator updates the start_times for (route, vehicle) when a new sequence is received - that is, it updates the time for the previous instance with the current time for every new prediction received ... so, eventually and in some cases the previous time could = the current if the prediction didn't change ... I think.

My proposed solution which I'm testing locally still and will try to test on mobullity shortly if you think it could work is as follows: (https://github.com/jmfield2/bullrunner-gtfs-realtime-generator/commit/9496679f75151c7923c6b3124fec10d79a4f518c)

When a new prediction is recv'd for stop #1, update the current instance time to reflect the new data. IFF the current instance time is 'old' enough (60*10 seconds, or 10 minutes) older than this new prediction time, then copy the current time to the previous time.

So far, it seems to be working as expected.

Any thoughts?

barbeau commented 9 years ago

When a new prediction is recv'd for stop #1, update the current instance time to reflect the new data. IFF the current instance time is 'old' enough (60*10 seconds, or 10 minutes) older than this new prediction time, then copy the current time to the previous time.

@jmfield2 when you say "new prediction", do you mean that the predicted arrival time for stop_sequence=1 changes?

A changing predicted arrival time for stop_sequence=1 in a stop_time_update alone doesn't necessarily indicate the beginning of a new trip instance (since that predicted could change multiple times as the predictions are refined as the vehicle approaches the stop - but, as mentioned before, the start_time of the trip instance should never change after it is set). My understanding of the current implementation is that it should be looking for non-increasing arrival time values in the stop_sequences, and will split the trip instances on that non-increasing time value (although admittedly I haven't dug into the code myself). See https://github.com/opentripplanner/OpenTripPlanner/issues/1347#issuecomment-52500250 discussion for a presentation of arrival times from the feed and how this tends to look for two different trip loop instances. Note that data errors in predictions could also potentially introduce more than one non-increasing value.

I think the "best" solution is probably to introduce some "reality-check" on the number of non-increasing values allowed to generate new trip instances for the same vehicle (in reality it should be 1 max, I believe, without digging deeper myself), in addition to the existing logic of splitting trip instances using the non-increasing values. This could also take the form of the hard time threshold you mention, to make sure we're not generating start_times that are way to close together. I'm not sure how long the Bull Runner normally takes to run a route, but my feeling is that 10 min is probably a reasonable threshold (and maybe even a little more) without getting too close to real circulation times.

Let me know if this isn't clear (maybe I'm not understanding your exact proposed solution either), and we can try to squeeze in a Hangout tomorrow or Wed.

cagryInside commented 9 years ago

@jmfield2 committed and deployed to mobullity my proposed solution for the duplicate start times in the trip-updates feed.

It looks like it fixed the start time problem. For example for the same trip and vehicle id, now we get different stop times:

entity {
  id: "18"
  trip_update {
    trip {
      trip_id: "13"
      start_time: "10:02:12"
      schedule_relationship: UNSCHEDULED
      route_id: "F"
    } ...
    vehicle {
      id: "3003"
    }
  }
}
entity {
  id: "19"
  trip_update {
    trip {
      trip_id: "13"
      start_time: "09:11:42"
      schedule_relationship: UNSCHEDULED
      route_id: "F"
    } ...
    vehicle {
      id: "3003"
    }
  }
}

However, we still don't update the trips in the OneBusAway (OBA https://github.com/OneBusAway/onebusaway-application-modules/tree/develop-freq). We already proposed and implemented a new flow for trip updat in frequency based systems (issue https://github.com/OneBusAway/onebusaway-application-modules/issues/128 and PR https://github.com/OneBusAway/onebusaway-application-modules/pull/129). Since both trip updates have same trip and vehicle id, we still skip the next trip in the OBA:

2015-04-09 09:59:13,591 INFO  [GtfsRealtimeSource.java:232] : refreshing http://mobullity.forest.usf.edu:8088/trip-updates
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 3003_13; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1329_13; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1123_8; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 4009_8; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1979_11; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 2102_1; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1980_11; taking newest.
2015-04-09 09:59:13,607 WARN  [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 2252_1; taking newest.

So, If this is the correct behavior for bull runner and other frequency based systems, we might want to update the OneBusAway project (issue https://github.com/OneBusAway/onebusaway-application-modules/issues/128 and PR https://github.com/OneBusAway/onebusaway-application-modules/pull/129). In this case we need to concatenate three parameters: trip_id + vehicle_id + start_time

If this is not the desired behavior, we need to change the bullrunner-gtfs-realtime-generator. (I personally think that this is the correct behavior, and we need to update the OBA).

cc'd @barbeau

jmfield2 commented 9 years ago

So, I checked this tonight and noticed an issue that I'll need to investigate further:

entity { id: "4" trip_update { trip { trip_id: "1" start_time: "21:27:13" schedule_relationship: UNSCHEDULED route_id: "A" }

trip_update { trip { trip_id: "1" start_time: "21:30:43" schedule_relationship: UNSCHEDULED route_id: "A" }

The same vehicle on route A had a start time 3 minutes apart ... I'm guessing this could be from bad syncromatics data or gaps, but I'm not sure yet.

On Thu, Apr 9, 2015 at 10:48 AM, Cagri Cetin notifications@github.com wrote:

@jmfield2 https://github.com/jmfield2 committed and deployed to mobullity my proposed solution for the duplicate start times in the trip-updates feed.

It looks like it fixed the start time problem. For example for the same trip and vehicle id, now we get different stop times:

entity { id: "18" trip_update { trip { trip_id: "13" start_time: "10:02:12" schedule_relationship: UNSCHEDULED route_id: "F" } ... vehicle { id: "3003" } } }

entity { id: "19" trip_update { trip { trip_id: "13" start_time: "09:11:42" schedule_relationship: UNSCHEDULED route_id: "F" } vehicle { id: "3003" } } }

However, we still don't update the trips in the OneBusAway (OBA https://github.com/OneBusAway/onebusaway-application-modules/tree/develop-freq). We already proposed and implemented a new flow for trip updat in frequency based systems (issue OneBusAway/onebusaway-application-modules#128 https://github.com/OneBusAway/onebusaway-application-modules/issues/128 and PR OneBusAway/onebusaway-application-modules#129 https://github.com/OneBusAway/onebusaway-application-modules/pull/129). Since both trip updates have same trip and vehicle id, we still skip the next trip in the OBA:

2015-04-09 09:59:13,591 INFO [GtfsRealtimeSource.java:232] : refreshing http://mobullity.forest.usf.edu:8088/trip-updates 2015-04-09 http://mobullity.forest.usf.edu:8088/trip-updates2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 3003_13; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1329_13; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1123_8; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 4009_8; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1979_11; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 2102_1; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 1980_11; taking newest. 2015-04-09 09:59:13,607 WARN [GtfsRealtimeTripLibrary.java:139] : Multiple TripUpdates for vehicle and trip 2252_1; taking newest.

So, If this is the correct behavior for bull runner and other frequency based systems, we might want to update the OneBusAway project (issue OneBusAway/onebusaway-application-modules#128 https://github.com/OneBusAway/onebusaway-application-modules/issues/128 and PR OneBusAway/onebusaway-application-modules#129 https://github.com/OneBusAway/onebusaway-application-modules/pull/129). In this case we need to concatenate three parameters: trip_id + vehicle_id + stop_time

If this is not the desired behavior, we need to change the bullrunner-gtfs-realtime-generator. (I personally think that this is the correct behavior, and we need to update the OBA).

cc'd @barbeau https://github.com/barbeau

— Reply to this email directly or view it on GitHub https://github.com/CUTR-at-USF/bullrunner-gtfs-realtime-generator/issues/8#issuecomment-91252843 .