SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Automation of ShEx validation #64

Open andrawaag opened 6 years ago

andrawaag commented 6 years ago

We currently have a ShEx online validator where wikidata items can be validated on an item by item basis (http://rawgit.com/shexSpec/shex.js/genewiki/doc/shex-simple.html?#). However applying ShEx on large (or complete) sets automation pipelines (e.g. Jenkins) are required.

A pipeline needs to be defined and applied.

andrawaag commented 6 years ago

I made a first a prototype that applies Shape Expressions in batches to a large set of items. https://github.com/andrawaag/ShExValidate. Currently, these scripts are disposable scripts and work on Disease and Pathway items only. The script takes a SPARQL query to select a set of Wikidata items that need to be validated. Then for each item, the selected ShEx (Disease or Pathway) is applied. The result is an itemized list of successes and issues: e.g. for diseases: https://rawgit.com/andrawaag/ShEx-reports/master/disease_errors.html.

The ShEx used only checks on the cardinality of triples and does not check for expected item patterns. Next steps involve writing actual ShEx to apply.

The reports also need work