Closed lauriemerrell closed 10 months ago
I think that more structured of this type would also be useful for the web-based validator UI, which currently just links to RULES.md to explain rule meaning. If the rule descriptions were in the rule code, presumably the web UI could display them more easily. But I defer to @KClough et al. on those considerations.
The index.js
file in this PR (click to expand the collapsed-by-default file) contained some improved language describing a lot of errors. It's been in the collective mental backlog to capture those somewhere. If there's an effort to capture the rules into a more machine-readable format that might be a good moment to gobble those up
I've got two updates here:
- I wrote up some more thoughts on the potential design options at https://bit.ly/gtfs-validator-notice-documentation. Feedback appreciated!
I'm not familiar with Java tooling at all, but maybe it could also be done the other way around: Checked-in JSON files act as the "source of truth" for the notices, they are machine-readable already anyways; An notice implementation (Java file) would read the corresponding JSON file and use its severity, summary, description, etc.; The RULES.md
file could be generated from the JSON files using a simple script.
@derhuerst I'm a little worried about the separation between the documentation and code in that case. Specifically, for the individual fields of the notice, which we'd need to define in two places. If we really did go with a JSON representation, I'd vote to generate source code for the Notices from the JSON itself. However, I wonder if we'll run into roadblocks there (e.g. we need custom Java methods on our notice to assist in type-conversion + construction that are awkward to encode/generate from JSON). My hunch is that it's easier to go from code => JSON, but I'd be interested in hearing other arguments for and against.
@bdferris-v2 I agree.
From my point of view ā as someone who wants to use gtfs-validator
and interpret the results in an automated way ā, as long as there's a reasonably easy way to generate a JSON artefact (or anything machine-readable really) containing the rules, I don't mind. š
Not actually done yet.
Resolved in v4.2: https://github.com/MobilityData/gtfs-validator/releases/tag/v4.2.0. All PRs referenced under Generate documentation automatically heading.
Describe the problem
Cal-ITP produces https://reports.calitp.org/, where we report on various aspects of GTFS data quality. One of the things we currently display on the site is a grid of validator notices output for a given feed in a given month. We like to display a human-readable notice description so that the notice can be understood by agencies and the general public, who may not be familiar with validator code names.
Currently, to update those human readable descriptions, we have to manually scrape the data from RULES.md for each validator version and turn it into a CSV that we can import through our pipeline.
To make the CSV, I:
code, description, severity
in one place (the severity is just indicated in the title of the table, which makes it harder to scrape)This also opens up issues like #1322 because RULES.md is maintained separately as a text file and not related to the actual validator code.
It would be nice if the human readable description about rule implementation were available as structured data (CSV or JSON) and could be output by the validator itself, rather than requiring reference to the RULES.md file (analogous to the new
notice_schema.json
file that can be output by the JAR).Proposed solution
Rule descriptions could be attributes within the rule implementation itself, and then RULES.md could be programmatically generated based on those attributes, rather than RULES.md being the source of truth but maintained separately.
Alternatives you've considered
No response
Additional context
It would be really nice to have something like
code, severity, short_desc, detailed_desc, formatted_desc
whereformatted
could contain Markdown (RULES.md has a shorter rule description in the tables at the top and then a slightly longer description below.)