Describe the bug
We may need to reconsider using composite keys to enforce row uniqueness for some tables where that requirement isn't explicitly defined in the spec.
An example is MBTA's current GTFS, which is attached - input.zip.
When running the validator I get the output:
Notice{filename='transfers.txt', level='ERROR', code='20', title='Duplicate entity', description='Entity must be unique in file: `transfers.txt` found other entity with same value for field: from_stop_id;to_stop_id`', extra='{fieldName=from_stop_id;to_stop_id}'}]
MBTA has added additional fields that aren't in the GTFS spec, from_trip_id and to_trip_id, to their data allow specifying transfers between stops on specific trips. Adding additional fields like this is allowed per the GTFS spec.
The situation here is that we're enforcing the combination of from_trip_id and to_trip_id to be unique in the validator and throwing an error when encountering multiple rows with the same values. But in MBTA's case, having multiple rows with the same values would be allowed, because they could be for from/to trip IDs. The spec doesn't explicitly say that the combination of from_stop_id and to_stop_id must be unique, from what I can tell.
We need to decide if MBTA's use case is valid, and if so, potentially change validation so we don't throw errors on this type of data for tables on which we're creating composite keys that could potentially have similar use cases and for which the spec doesn't explicitly say that the combination of fields must be unique.
To Reproduce
Run validator with -u https://cdn.mbta.com/MBTA_GTFS.zip now, or use the attached GTFS as input - input.zip
Expected behavior
Don't throw errors on valid data. Is MBTA's use case valid?
Witnessed behavior
An error is thrown for MBTA's spec extension that allows specifying transfers between stops for specific trips
Describe the bug We may need to reconsider using composite keys to enforce row uniqueness for some tables where that requirement isn't explicitly defined in the spec.
An example is MBTA's current GTFS, which is attached - input.zip.
When running the validator I get the output:
transfers.txt looks like:
MBTA has added additional fields that aren't in the GTFS spec,
from_trip_id
andto_trip_id
, to their data allow specifying transfers between stops on specific trips. Adding additional fields like this is allowed per the GTFS spec.The situation here is that we're enforcing the combination of
from_trip_id
andto_trip_id
to be unique in the validator and throwing an error when encountering multiple rows with the same values. But in MBTA's case, having multiple rows with the same values would be allowed, because they could be for from/to trip IDs. The spec doesn't explicitly say that the combination offrom_stop_id
andto_stop_id
must be unique, from what I can tell.We need to decide if MBTA's use case is valid, and if so, potentially change validation so we don't throw errors on this type of data for tables on which we're creating composite keys that could potentially have similar use cases and for which the spec doesn't explicitly say that the combination of fields must be unique.
To Reproduce Run validator with
-u https://cdn.mbta.com/MBTA_GTFS.zip
now, or use the attached GTFS as input - input.zipExpected behavior Don't throw errors on valid data. Is MBTA's use case valid?
Witnessed behavior An error is thrown for MBTA's spec extension that allows specifying transfers between stops for specific trips
Environment used