Open isabelle-dr opened 1 year ago
Hey, this issue has recently become more urgent for me. I'd be interested in contributing some cycles to finding a solution here, especially one that allows a maximal number of validators to run.
@davidgamez and I had discussed in the past the idea of conditionally running more multi-file validators if their underlying data dependencies don't have parse errors. I've got an initial implementation of that approach in PR #1496.
Here, if a FileValidator
has an injected dependency on a GTFS table that has parse errors, then the validator still wouldn't run, because the underlying table might be missing data that would cause spurious additional errors (e.g. foreign key reference validation). However, if all the injected dependencies are ok, then we can still run the validator.
I think this approach strikes a reasonable balance between running more validators without having to do potentially significant engineering to run all validators (e.g. making each validator resilient to invalid underlying data).
Thoughts?
Currently, where there is one parsing problem in the data, none of the multi-file validators run.
This creates issues such as https://github.com/MobilityData/gtfs-validator/issues/1096 or https://github.com/MobilityData/gtfs-validator/issues/1167.
We want to optimize this logic so that only the validators that are dependent on the data being properly formatted don't run. For example,
route_color_contrast
in dependent on the color being properly formatted (or on invalid_color not being triggered). If a color is not properly formatted, we only want the validator that triggersroute_color_contrast
to not run, as opposed to all the multi-file validators.