OneBusAway / onebusaway-application-modules

The core OneBusAway application suite.
https://github.com/OneBusAway/onebusaway-application-modules/wiki
Other
200 stars 132 forks source link

transit-data-federation-builder throws error when sequential stop times are the same #99

Open jrharshath opened 9 years ago

jrharshath commented 9 years ago

While compiling the new marta bundle I got this error:

arrival time is less than previous departure time for stop time with trip_id=MARTA_4149045 stop_sequence=10

I looked at the stop_times.txt file, and found that the departure time for stop sequence 9 is the same is the arrival time for stop sequence 10 are the same. This module should not be failing on this condition, since the arrival time is not less than the previous departure time.

barbeau commented 9 years ago

@jrharshath IIRC, I hit this same issue with HART's GTFS data in Tampa. I think the builder is correct to flag this as a data error, although the error message should be refined. departure_time and arrival_time should have a resolution in seconds (i.e., HH:MM:SS), and I doubt that it takes less than a second in the real world to travel from stop_sequence=9 to stop_sequence=10. IIRC, HART's GTFS feed at the time had a resolution in minutes for departure_time and arrival_time, which violated the GTFS spec (even though their GTFS data was approved and used by Google). They updated their data to include seconds, and that fixed the issue for us (since departure_time and arrival_time were no longer equal).

barbeau commented 9 years ago

@jrharshath From a quick look at MARTA's data they do have HH:MM:SS format for departure_time and arrival_time, so IMO this is likely a data issue on their end. If they don't want to fix it you could always tweak the data yourself to bump the arrival_time at stop_sequence=10 by a second. I don't think this should have any adverse effects and should be fairly straightforward, assuming there is no cascading issue with times (which would be harder to edit by hand).

bdferris commented 9 years ago

This is definitely kind of a grey area in the spec. Google has historically run into a number of agencies who weren't able to specify times at the second level, so we have some logic to deal with feeds where the second portion of HH:MM:SS is always 00, even if it means sequential stop times have the same times. You'll notice the open-source validator doesn't reject these feeds either.

jrharshath commented 9 years ago

At the moment, I have a perl script that adds an extra second to the arrival time at the next stop to get around this problem.

I think since the GTFS validator considers this valid, we shouldn't fail to compile such a case either - thoughts?

(meanwhile, I'll try to reach out to Marta and see if they would be willing to fix this on their end)

laidig commented 9 years ago

I don't think they'll see it as a data issue. While the GTFS FeedValidator may throw a warning for these, they're not outside of the GTFS spec.

We had a similar issue in NYC here at the very beginning, but after a search of my email I can't find out exactly what the particular problem was. IIRC, this actually was masking a problem with the shape itself.

barbeau commented 9 years ago

This is definitely kind of a grey area in the spec

Just to clarify - the spec clearly says that HH:MM:SS is required, the grey area is whether departure_time and arrival_time are strictly increasing within a trip. A student recently asked me about this, and after reading through the spec again, the only explicit reference I could find is in the stops.txt stop_timezone field description, which says (emphasis is mine):

...the times in stop_times.txt should continue to be specified as time since midnight in the timezone specified by agency_timezone in agency.txt. This ensures that the time values in a trip always increase over the course of a trip, regardless of which timezones the trip crosses."

(If I missed another explicit reference in the spec that says more on this matter, please let me know.)

I'd argue there is an implicit reference also. A contributing issue here is that MARTA's data (along with many other agencies) is out of spec because they are including non-timepoint entries in stop_times.txt, despite the spec clearly indicating otherwise:

If this stop isn't a time point, use an empty string value for the arrival_time and departure_time fields...To ensure accurate routing, please provide arrival and departure times for all stops that are time points. Do not interpolate stops.

Otherwise, they'd only have stop_times.txt entries that were strictly increasing in time (I assume there aren't any cases where two sequential timepoints in a trip would have identical times?). I look forward to the day when the timepoint field proposal is adopted to bring the spec in line with this common practice. While it legitimizes non-timepoint times in stop_times.txt, this proposal doesn't, however, further clarify whether such times within a trip should be strictly increasing.