MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
266 stars 100 forks source link

Question: Have you thought about these validation rules? #1729

Open dancesWithCycles opened 3 months ago

dancesWithCycles commented 3 months ago

Describe the problem

Hi folks, Thank you so much for maintaining this repository and this rule overview!

I came across the Duplicate Route Name Rule and thought for myself:

Does anyone already gave a Duplicate Trip ID Rule and Duplicate Trip Name Rule a thought?

According to trips.txt trip_id and trip_short_name shall be unique (at least on a service day basis). If a GTFS feed is the result of a fusion of many different sources of public transport schedule data from many different providers, it is a common observation (at least for me) that trip_id's and trip_short_name's are unique considering a single source but not unique anymore in the resulting overall GTFS feed. Looking at the first and last departure stop and time I can tell that several stop_times.txt entries shall belong to different trips but have the same tips_id or trip_short_name. Any idea how to tackle this observation with the GTFS validator?

How about vice versa? Does anyone already gave a Duplicate Departure Stop Time Rule a thought? I am observing stop_times.txt entries that differ in unique agency_id, unique route_id, unique trip_id and unique trip_short_name but have the same first and last departure_time and stop_id. I can not imagine several trips with identical first and last departure_time and stop_id with different agency_id, route_id and trip_id. Can someone imagine this observation and has an idea how to tackle it with the GTFS validator?

I stumbled over the Trip Coverage Next Days Rule. I like this rule very much. Kudos! I could make use of this rule even more if the number of days would be an argument that I can supply to the GTFS validator as parameter on a GTFS feed specific or costumer specific manner. Any idea if a dynamic rule like this is possible with the current architecture of the GTFS validator?

I am also wondering if we can derive from the Trip Coverage Next Days Rule anAgency Coverage Trip Count Rule. I observed many times that agencies provide public transport schedule data only for a subset of trips and not for all trips. In other words, the data delivery is missing the remaining trip subset. As a consequence, you count only a subset of trips per agencies in the resulting GTFS feed. If I provide the GTFS validator with a list of minimum trip counts per agency in a CSV like file, do you think this observation will be tackled by a validation rule? The validator shall tell me the agencies that have trip counts below the minimum trip count per agency threshold.

Cheers!

Describe the new validation rule

Please see above.

Sample GTFS datasets

Please see above.

Severity

Please see above.

Additional context

Please see above.

emmambd commented 1 month ago

Hi @dancesWithCycles! Thanks for your patience - our team had several different discussions about your proposed rules here.

Looking at the first and last departure stop and time I can tell that several stop_times.txt entries shall belong to different trips but have the same tips_id or trip_short_name. Any idea how to tackle this observation with the GTFS validator?

Could you share some more context for how you know that the stop_times.txt entries should belong to different trips but are associated with the same trip_id? We assume you're deriving this from the same stop being serviced at different times that are extremely close together, like 8am and 8:10am on the same day. But curious to know more. It would be very helpful if you had a feed example with trip rows to include as well.

Does anyone already gave a Duplicate Departure Stop Time Rule a thought? I am observing stop_times.txt entries that differ in unique agency_id, unique route_id, unique trip_id and unique trip_short_name but have the same first and last departure_time and stop_id. I can not imagine several trips with identical first and last departure_time and stop_id with different agency_id, route_id and trip_id.

Could you share examples of feeds where you're seeing this use case? It may warrant an INFO notice in the validator to flag that something looks strange, but we're wondering if there are cases of aggregate feeds where it might be legitimate.

I stumbled over the Trip Coverage Next Days Rule. I like this rule very much. Kudos! I could make use of this rule even more if the number of days would be an argument that I can supply to the GTFS validator as parameter on a GTFS feed specific or costumer specific manner. Any idea if a dynamic rule like this is possible with the current architecture of the GTFS validator?

Currently, making this rule dynamic is outside the scope of what's possible with the GTFS validator. However, providing custom validation in the validator has been a long standing feature request that we intend to address in the future (not within the next year though). If you'd like to share your thoughts or needs on this feature, there's an issue for it here.

If I provide the GTFS validator with a list of minimum trip counts per agency in a CSV like file, do you think this observation will be tackled by a validation rule? The validator shall tell me the agencies that have trip counts below the minimum trip count per agency threshold.

Similar to the above question, this is out of scope at present because it requires dynamic inputs. However, Transport Data Gouv has a great GTFS diff tool that can help compare two different feed versions and see if a trip count looks dramatically different from how it did previously. We also provide a trip count in the summary of the validation report.

Let me know if you have any other questions!

dancesWithCycles commented 1 month ago

I stumbled over the Trip Coverage Next Days Rule. I like this rule very much. Kudos! I could make use of this rule even more if the number of days would be an argument that I can supply to the GTFS validator as parameter on a GTFS feed specific or costumer specific manner. Any idea if a dynamic rule like this is possible with the current architecture of the GTFS validator?

Currently, making this rule dynamic is outside the scope of what's possible with the GTFS validator. However, providing custom validation in the validator has been a long standing feature request that we intend to address in the future (not within the next year though). If you'd like to share your thoughts or needs on this feature, https://github.com/MobilityData/gtfs-validator/issues/1067.

Hi there, I still like the trip_coverage_not_active_for_next7_days rule very much. Currently, I can not use this rule as much as I like. The 7 day window is to short for me in everyday live to react on missing trip coverage in a productive GTFS archive and I am not aware about the definition of ...the majority service window.. That is why I would like to ask the following.

Cheers!