MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
267 stars 100 forks source link

Flag if a URL listed inside the GTFS dataset doesn't respond/exist #1521

Open isabelle-dr opened 1 year ago

isabelle-dr commented 1 year ago

Describe the problem

A user has asked if this validator could validate if the URLs provided in the GTFS dataset (e. g. agency_url, stop_url, etc) work as intended.

The specification says:

URL - A fully qualified URL that includes http:// or https://, and any special characters in the URL must be correctly escaped. See the following http://www.w3.org/Addressing/URL/4_URI_Recommentations.html for a description of how to create fully qualified URL values.

Although there is no explicit mention that the URL needs to not through a 404 error, this seems like a very useful addition to this validator that is in line with "fully qualified URL".

Describe the new validation rule

If one of the URL fields in the GTFS dataset through a 404 Error, generate a Warning.

Sample GTFS datasets

No response

Severity

WARNING

Additional context

No response

cka-y commented 1 year ago

I have created a PR a few days ago. The acceptance tests are failing. After analysis of this test, I found that the failure is due to the added time it takes to validate the urls. Some of our datasets have thousands of urls and it take approximatively 3-4 seconds to validate each (for 3000 url entries it adds at least 5min to the validation time). I don't think we can do any better on validation time. After consulting @davidgamez, we believe we might need to push back this issue until we have the custom validation profile (also mentioned in #1441) i.e. the url accessibility check would be an optional notice/validation. We believe it is essential as the validation is highly dependant on the user network and can affect the user experience. Thoughts?

davidgamez commented 1 year ago

I support delaying the issue until consumers can skip a validation notice. Few points to support it,