etalab / transport-validator

GTFS validator
https://transport.data.gouv.fr/validation/
MIT License
37 stars 10 forks source link

Review error levels for duplicates #178

Closed AntoineAugusti closed 9 months ago

AntoineAugusti commented 10 months ago

From the GTFS specification, ["Field types" section] for ID.

An ID field value is an internal ID, not intended to be shown to riders, and is a sequence of any UTF-8 characters. Using only printable ASCII characters is recommended. An ID is labeled "unique ID" when it must be unique within a file.

When the type is marked as Unique ID I think it should be an ERROR severity instead of a WARNING.

The relevant code path seems to be here.

https://github.com/etalab/transport-validator/blob/0d120e25e68a4741d156f085f7f04affe51810a7/src/validators/raw_gtfs.rs#L23-L40

a duplicate ID for stops.id should be an ERROR. Other fields should be reviewed, we could check for more duplicates as well.