MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
290 stars 101 forks source link

Smarter check for if route_long_name contains route_short_name #267

Closed lionel-nj closed 1 year ago

lionel-nj commented 4 years ago

Is your feature request related to a problem? Please describe. File routes.txt requires to check if route_long_name contains route_short_name.

However, depending on the number of characters of route_short_name, route_short_name could be contained in route_long_name. For example: route_short_name is C, and route_long_name is Cxxxxxxxxxxx. Here, route_short_name is just a character of the string defining route_long_name. Yet, with our current implementation, a notice would be generated although it should not.

Describe the solution you'd like Implement a regex or an algorithm using a threshold to define whether route_long_name contains route_short_name.

barbeau commented 4 years ago

A simple solution may be to only check for this rule if route_short_name is >= 2 or 3 characters.

briandonahue commented 1 year ago

@barbeau @isabelle-dr I am looking at this issue. Is the preferred solution what @aababilov mentioned here:

check that long name does not start from the short name followed by ' ', '-' or '('.

barbeau commented 1 year ago

Seems reasonable to me

isabelle-dr commented 1 year ago

Sounds good. Hello @barbeau 👋😊 This issue was initially opened to replace route_short_and_long_name_equal, and the reference is in the GTFS Best Practices, not the spec.

briandonahue commented 1 year ago

Linking to @julianharty's question on the associated PR , should we add any more explanation to the documentation about how this rule works?

Also are the characters ' ', '-' and '(' as mentioned here sufficient, or are there other characters – like ':' or ')' – that we should consider in this PR?