Open YuliaTom opened 4 days ago
Thanks for opening your first issue in this project! If you haven't already, you can join our slack and join the #gtfs-validators channel to meet our awesome community. Come say hi :wave:!
Welcome to the community and thank you for your engagement in open source! :tada:
Describe the problem
Mobility Data Validator currently accepts the Unicode replacement character (
�
) in thestop_name
field. This character typically indicates a decoding error and should not be present in valid data. While I've specifically noticed the acceptance of this replacement character, I am unsure if other similar problematic characters are also being accepted. Ideally, the validator should flag or rejectstop_name
fields containing the replacement character and potentially other similar invalid characters, as was done by the older GTFS validator, FeedValidator - https://github.com/google/transitfeed/tree/master, which raised a Unicode error (E33) for invalid values in the stop_name field. Implementing similar checks in future releases would help prevent data issues caused by improperly decoded or malformed input. Could this behavior be reviewed and corrected in future releases please? Thank you for your consideration and for your continued work on this project!Describe the new validation rule
replacement_char = "\uFFFD" if replacement_char in stop_name: trigger_error("invalid_character_in_stop_name", stop_name)
Sample GTFS datasets
auckland_gtfs.zip
Severity
No response
Additional context
No response