MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
278 stars 100 forks source link

Validator Accepts Replacement Character in `stop_name` Field #1840

Open YuliaTom opened 4 days ago

YuliaTom commented 4 days ago

Describe the problem

Mobility Data Validator currently accepts the Unicode replacement character () in the stop_name field. This character typically indicates a decoding error and should not be present in valid data. While I've specifically noticed the acceptance of this replacement character, I am unsure if other similar problematic characters are also being accepted. Ideally, the validator should flag or reject stop_name fields containing the replacement character and potentially other similar invalid characters, as was done by the older GTFS validator, FeedValidator - https://github.com/google/transitfeed/tree/master, which raised a Unicode error (E33) for invalid values in the stop_name field. Implementing similar checks in future releases would help prevent data issues caused by improperly decoded or malformed input. Could this behavior be reviewed and corrected in future releases please? Thank you for your consideration and for your continued work on this project!

Describe the new validation rule

replacement_char = "\uFFFD" if replacement_char in stop_name: trigger_error("invalid_character_in_stop_name", stop_name)

Sample GTFS datasets

auckland_gtfs.zip

Severity

No response

Additional context

No response

welcome[bot] commented 4 days ago

Thanks for opening your first issue in this project! If you haven't already, you can join our slack and join the #gtfs-validators channel to meet our awesome community. Come say hi :wave:!

Welcome to the community and thank you for your engagement in open source! :tada: