MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
290 stars 101 forks source link

MBTA fails validation due to extension fields in transfers.txt #361

Closed barbeau closed 3 years ago

barbeau commented 4 years ago

Describe the bug We may need to reconsider using composite keys to enforce row uniqueness for some tables where that requirement isn't explicitly defined in the spec.

An example is MBTA's current GTFS, which is attached - input.zip.

When running the validator I get the output:

Notice{filename='transfers.txt', level='ERROR', code='20', title='Duplicate entity', description='Entity must be unique in file: `transfers.txt` found other entity with same value for field: from_stop_id;to_stop_id`', extra='{fieldName=from_stop_id;to_stop_id}'}]

transfers.txt looks like:

from_stop_id,to_stop_id,transfer_type,min_transfer_time,min_walk_time,min_wheelchair_time,suggested_buffer_time,wheelchair_transfer,from_trip_id,to_trip_id
102,1060,0,,,,,1,,
102,1123,0,,,,,1,,
102,72,0,,,,,1,,
102,place-cntsq,0,,,,,1,,
1060,102,0,,,,,1,,
...

MBTA has added additional fields that aren't in the GTFS spec, from_trip_id and to_trip_id, to their data allow specifying transfers between stops on specific trips. Adding additional fields like this is allowed per the GTFS spec.

The situation here is that we're enforcing the combination of from_trip_id and to_trip_id to be unique in the validator and throwing an error when encountering multiple rows with the same values. But in MBTA's case, having multiple rows with the same values would be allowed, because they could be for from/to trip IDs. The spec doesn't explicitly say that the combination of from_stop_id and to_stop_id must be unique, from what I can tell.

We need to decide if MBTA's use case is valid, and if so, potentially change validation so we don't throw errors on this type of data for tables on which we're creating composite keys that could potentially have similar use cases and for which the spec doesn't explicitly say that the combination of fields must be unique.

To Reproduce Run validator with -u https://cdn.mbta.com/MBTA_GTFS.zip now, or use the attached GTFS as input - input.zip

Expected behavior Don't throw errors on valid data. Is MBTA's use case valid?

Witnessed behavior An error is thrown for MBTA's spec extension that allows specifying transfers between stops for specific trips

Environment used

barbeau commented 3 years ago

Closing, fixed with v2 architecture