GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
628 stars 99 forks source link

DCAT-US data.json Validator - display Dataset id or title to help identifying which dataset has a validation error #4427

Open leebrian opened 1 year ago

leebrian commented 1 year ago

Would you please update the Validation Results to display the dataset id and name rather than the number to help users identify which dataset has a validation error?

Currently, when I validate my agency's data.json, it just displays the dataset number and this is very difficult to correspond to the specific dataset that has an error. Previous versions would display the particular dataset record that had the problem so it was easier to see which dataset has errors.

Thank you for providing this tool as it helps data stewards, who typically do not use schema validation tools and are unfamiliar with json, find and fix errors in their metadata so that the public may better find and use our data.

How to reproduce

  1. Visit https://catalog.data.gov/dcat-us/validator
  2. Enter https://data.cdc.gov/data.json
  3. View Validation Results that don't have a good UX for figuring out what dataset has the issue (eg, Dataset ➡ 9 ➡ contactPoint ➡ hasEmail has a problem is not easily matched to the particular dataset, "https://data.cdc.gov/api/views/29hc-w46k", that has the error

Expected behavior

Display a more functional error message that reduces the burden for data stewards to identify their dataset and correct errors, thus improving the qualify of catalog data on data.gov.

Perhaps something like this: Dataset ➡ 9, https://data.cdc.gov/api/views/29hc-w46k, Weekly Rates of Laboratory-Confirmed RSV Hospitalizations from the RSV-NET Surveillance System ➡ contactPoint ➡ hasEmail has a problem

Actual behavior

Validation results only displays the dataset index, eg Dataset ➡ 9 ➡ contactPoint ➡ hasEmail has a problem

This means it is hard for a particular data steward to validate their dataset as they would first need to learn the index of their datasets and then check against the error report. Including the id and title in the report makes the report more useful.

gujral-rei commented 8 months ago

Recommend creating a different ticket. @jbrown-xentity

jbrown-xentity commented 8 months ago

Should probably implement https://github.com/GSA/data.gov/issues/4638 instead...

leebrian commented 8 months ago

I don’t see how this is related, my request is about the validation output and making it more usable, while it seems 4638 is about making it easier to run validation. If 4638 is completed it won’t fulfill this issue.

jbrown-xentity commented 8 months ago

Unfortunately the validation process is using outdated infrastructure, and updating the UI is extremely complex. This other issue will get the data details right, and some demo work that was just completed here provides a roadmap for how to build a UI for this to be much more robust.

gujral-rei commented 7 months ago

Consider folding some of the needs as part of H2.0