MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
290 stars 101 forks source link

Validation report: Show the list of notices that could not have been checked because of a parsing problem #1485

Open isabelle-dr opened 1 year ago

isabelle-dr commented 1 year ago

⚠️ Need to do https://github.com/MobilityData/gtfs-validator/issues/1537 and https://github.com/MobilityData/gtfs-validator/issues/1536 before this issue.

Context about the issue

When there is a parsing problem, some of the validators don't run validators don't run and a portion of the notices can't be checked. This creates confusion for the users (see https://github.com/MobilityData/gtfs-validator/issues/1167). We want to add a new section in the validation report that gives the user the list of notices that could not have been checked because of a parsing problem.

The list of notices referenced should only include validators that were not run because of a parsing problem, -not- because a component was missing from the dataset. (E.g duplicate_fare_media should not be included in the list of notices if there is no fares data, or pathway_to_wrong_location_type should not be included to the list of notices if there is no pathway data).

What this issue is for

This issue is for adding the section in the validation report that gives the user a list of notices that could not have been checked because of a parsing problem. This informs the user that these notices might be present in the data.

Not doing in this issue

Optimizing the list of validators that didn't run because of a parsing problem so that we don't stop validators from running when they actually can. We have an issue open for this https://github.com/MobilityData/gtfs-validator/issues/1484.

isabelle-dr commented 1 year ago

Related to https://github.com/MobilityData/gtfs-validator/issues/1484 Might be related: https://github.com/MobilityData/gtfs-validator/issues/1089

briandonahue commented 1 year ago

@isabelle-dr would this list be shown on the HTML report, and/or do you want it included in the JSON report?

isabelle-dr commented 1 year ago

@briandonahue I am not sure. I see that when we added the Metadata (or Summary) in the HTML report, we didn't add it to the JSON report. Why did we decide not to, in this case?

I thought it would be more straightforward to have all the results in the JSON report and then use this to display the HTML page, but I may be over-simplifying things. I also think from a user perspective, it's nice to have parity so that people that use the JSON report get the additional value as well, although we should be careful with breaking changes.

briandonahue commented 1 year ago

@isabelle-dr My understanding (perhaps incorrect) was that the metadata section was only requested for the HTML report. The two are generated separately currently. I don't necessarily think it's a bad idea to use the JSON as the source for the HTML report, but it would require significant changes.

briandonahue commented 1 year ago

Initial work in #1496 is gathering the list of skipped validators due to parsing errors, but the further details required to collect the resulting skipped rules/notices is not readily available and will require more discussion and effort to allow those to be derived from the skipped validators. Additionally we may want to capture and report contextual information on why certain rules could not be validated, such as which files could not be parsed, and which rules could not be validated as a result.