OneBusAway / onebusaway-application-modules

The core OneBusAway application suite.
https://github.com/OneBusAway/onebusaway-application-modules/wiki
Other
206 stars 133 forks source link

transit-data-federation-builder throws error when sequential stop IDs are identical #150

Open caywood opened 9 years ago

caywood commented 9 years ago

Originally reported: https://groups.google.com/forum/#!topic/onebusaway-developers/u5Fb1F33iIk

Related: #99

In Vancouver Translink GTFS: http://ns.translink.ca/gtfs/google_transit.zip

We get "arrival time is less than previous departure time for stop time with trip_id=CMBC_XXXX stop_sequence=YY" when the arrival time is, in actuality, greater.

The issue is the following. TransLink is putting pairs of the same stop_id in their trips, with different stop_sequence numbers. The first stop has pickup and dropoff, the second does not.

trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled
...
7006031,14:20:43,14:20:43,74,8,,0,0,1.2278
7006031,14:21:57,14:21:57,11,9,,0,0,1.4251
7006031,14:22:00,14:22:00,11,10,,1,1,1.4251
7006031,14:23:22,14:23:22,11429,11,,0,0,1.5715

Note the arrival time is 14:22:00 which is more than the previous departure time 14:21:57 (3 seconds).

@barbeau says:

technically this GTFS data violates the GTFS spec, as the GTFS spec says that unless you're including the "timepoint" field, there shouldn't be interpolated times in arrival_time and departure_time (which I assume these times are). That being said, a lot of agencies do that, so that alone shouldn't prevent us from building a bundle.

From a quick look at the data I'd definitely think its a data error on Vancouver's end of things. One issue is that the bus doesn't move - shape_dist_traveled stays the same, which does violate the GTFS spec:

The values used for shape_dist_traveled must increase along with stop_sequence

This may be the root of the error that OBA is throwing, the error description may just not be descriptive of the actual problem.

If we decide to tolerate this type of data in OBA, one question is how OBA should behave when processing this type of data - it messes with things like the propagation of real-time delay info down a trip. Off the top of my head it may be best for OBA to just remove duplicate sequential stops with the same shape_dis_traveled.

barbeau commented 9 years ago

@caywood Would you be able to raise the duplicate shape_dist_traveled issue with Vancouver and see if they'd be willing to fix their data?

barbeau commented 9 years ago

I also wanted to make sure we had a copy of the data with this issue for any tests - looks like GTFS Data Exchange for Vancouver: http://www.gtfs-data-exchange.com/agency/translink/

...is archiving from a different URL than the above?

http://mapexport.translink.bc.ca/current/google_transit.zip

@caywood any idea which is preferred?

caywood commented 9 years ago

@barbeau according to their Google Group, both of those have been replaced by http://ns.translink.ca/gtfs/google_transit.zip

https://groups.google.com/forum/?hl=en&fromgroups#!topic/translink-developers/g38yIJqSP2E

I've asked GTFS-DX to update their link

barbeau commented 9 years ago

@caywood Ok, thanks. I'll update GTFS Data Exchange.

caywood commented 9 years ago

Just reported to TransLink. Link to thread:

https://groups.google.com/forum/?hl=en&fromgroups#!topic/translink-developers/W6MKlGTpLTs

barbeau commented 9 years ago

Cool, thanks. Just updated GTFS Data Exchange, it should crawl the new URL tonight.

kurtraschke commented 9 years ago

Here's a particularly odd case, from the same GTFS:

trip_id arrival_time departure_time stop_id stop_sequence stop_headsign pickup_type drop_off_type shape_dist_traveled
7004872 24:08:00 24:08:00 4881 3 1 0 0.7701
7004872 24:10:00 24:10:00 4881 4 1 1 0.7701
7004872 24:10:00 24:10:00 4881 5 0 0 0.7701

It's not at all clear what the intended real-world effect is (and I haven't checked an actual timetable to see what this bus really does). So, I don't think this is the best thing for a GTFS producer to do, although it probably is legal.

Anyway, OneBusAway does remove duplicate stoptimes (in StopTimeEntriesFactory::removeDuplicateStopTimes), but only in the case that they have identical arrival or departure times–and even then the second (or third, etc.) consecutive stoptime isn't actually removed from the trip, but rather it has its times cleared so that they will be replaced during interpolation (which is part of what triggers this bug). In the GTFS excerpted above, this results in the stoptime with stop_sequence=5 having its times cleared.

The root of this bug appears to lie in StopTimeEntriesFactory::populateArrivalAndDepartureTimesByDistanceTravelledForStopTimes, where we create a TreeMap mapping distance-along-trip to times. Quoting a comment in that method:

/**
* For StopTime's that have the same distance travelled, we keep the min
* arrival time and max departure time
*/

In the trip quoted above, the minimum arrival time at shape_dist_traveled=0.7701 is 24:08:00, and the maximum departure time is 24:10:00. So long as interpolation is not required, nothing untoward happens. But interpolation is required, for the stoptime with stop_sequence=5 (because its times had been cleared previously in removeDuplicateStopTimes). Now we interpolate based on shape_dist_traveled=0.7701, and this results in an interpolated arrival time for stop_sequence=5 of 24:08:00, which is prior to the (actual) departure time of the preceding stoptime, which was 24:10:00.

caywood commented 8 years ago

Per the translink-dev thread above, this is a vendor issue they're not planning to fix. Probably the best option in such cases is to delete the optional shape_dist_traveled field.

kevinsafon commented 8 years ago

@barbeau what would happen if we just ignore these errors and comment out the part of the code that throws these errors (using Eclipse etc)? Could the bundle still be of good quality or would that mess it up completely and make it inaccurate?

barbeau commented 8 years ago

@kevinsafon If its failing on an optional field like shape_dist_traveled, I agree with @caywood that the best workaround solution is probably to remove the field from your GTFS data and rebuild the bundle. You won't get certain features in OBA REST API responses like estimated distance traveled along the shape, but I believe most everything else related to real-time estimates should work correctly. I'd avoid commenting out code to avoid errors as this could have unexpected consequences.

Another similar option is to fix the GTFS data yourself, if its easy to understand what the field should be.

For editing the GTFS data, I'd suggest checking out the OBA GTFS Transformer, which makes this much simpler.

sheldonabrown commented 8 years ago

I believe shape_dist_traveled will be re-calculated in fact if not present. So stripping from GTFS would be my recommendation as well.

Sheldon

On Mon, Oct 19, 2015 at 9:18 AM, Sean Barbeau notifications@github.com wrote:

@kevinsafon https://github.com/kevinsafon If its failing on an optional field like shape_dist_traveled, I agree with @caywood https://github.com/caywood that the best workaround solution is probably to remove the field from your GTFS data and rebuild the bundle. You won't get certain features in OBA REST API responses like estimated distance traveled along the shape, but I believe most everything else related to real-time estimates should work correctly. I'd avoid commenting out code to avoid errors as this could have unexpected problems.

Another similar option is to fix the GTFS data yourself, if its easy to understand what the field should be.

For editing the GTFS data, I'd suggest checking out the OBA GTFS Transformer http://developer.onebusaway.org/modules/onebusaway-gtfs-modules/1.3.3/onebusaway-gtfs-transformer-cli.html, which makes this much simpler.

— Reply to this email directly or view it on GitHub https://github.com/OneBusAway/onebusaway-application-modules/issues/150#issuecomment-149210838 .

kevinsafon commented 8 years ago

Thanks guys, we ended up writing our own program to clean the stop_times file. We could build the bundle successfully!

barbeau commented 8 years ago

@kevinsafon Cool! Glad you got it working. We'll leave this issue open for a potential fix in the future.

BTW, if you want to see one effect of bad shape data, see the Android screenshot here. The interpolated position data when real-time isn't available is....not right. :)