Provide scores for individual checks

gothub commented 5 years ago

Requested by @JEDamerow (via NCEAS Slack):

One thing that would help in further evaluating some of the deployed checks is to get info on 
datasets pass/fail each check. For example, I am thinking about the title length right now.  
7-20  seems reasonable to me, but I wonder how many exceed 20 and what those titles
look like.

Currently this type of summary info is not available from the quality engine. The quality reports are stored in a Postgres database running in a k8s pod (not publicly accessible).

One approach to making this data available is to have a 'checks' Solr core that tracks how many datasets pass/fail/skip/error for each check.

Suggestions?

jeanetteclark commented 5 years ago

@gothub that approach would make sense to me. Would it be unreasonable to be able to query for a particular pid to see more information about its score? Or get a list of datasets that fail a particular check? (not sure which of those two formats would be easier to implement). Regardless, it would be nice to get examples of datasets that pass/fail/skip/error a check, as the slack comment above mentions

mbjones commented 5 years ago

If its already in postgres, there may not be a reason to implement it in SOLR, except to keep the query syntax consistent. If its a simple REST API, direct access to postgres might work fine.

NCEAS / metadig-engine

Provide scores for individual checks #221