AKSW / RDFUnit

An RDF Unit Testing Suite
http://RDFUnit.aksw.org
Apache License 2.0
153 stars 40 forks source link
data-quality data-quality-checks data-validation rdf schema schema-validation shacl unit-testing validation web-ontology-language

RDFUnit - RDF Unit Testing Suite

Maven Central Build Status Coverity Scan Build Status Coverage Status Codacy Badge codebeat badge Project Stats

Homepage: http://rdfunit.aksw.org
Documentation: https://github.com/AKSW/RDFUnit/wiki
Slack #rdfunit: https://dbpedia-slack.herokuapp.com/
Mailing list: https://groups.google.com/d/forum/rdfunit (rdfunit [at] googlegroups.com)
Presentations: http://www.slideshare.net/jimkont
Brief Overview: https://github.com/AKSW/RDFUnit/wiki/Overview

RDFUnit is implemented on top of the Test-Driven Data Validation Ontology and designed to read and produce RDF that complies to that ontology only. The main components that RDFUnit reads are TestCases (manual & automatic), TestSuites, Patterns & TestAutoGenerators. RDFUnit also strictly defines the results of a TestSuite execution along with different levels of result granularity.

Contents

Basic usage

See RDFUnit from Command Line or bin/rdfunit -h for (a lot) more options but the simplest setting is as follows:

$ bin/rdfunit -d <local-or-remote-location-URI>

What RDFUnit will do is:

  1. Get statistics about all properties & classes in the dataset
  2. Get the namespaces out of them and try to dereference all that exist in LOV
  3. Run our Test Generators on the schemas and generate RDFUnit Test cases
  4. Run the RDFUnit test cases on the dataset
  5. You get a results report in html (by default) but you can request it in RDF or even multiple serializations with e.g. -o html,turtle,jsonld
    • The results are by default aggregated with counts, you can request different levels of result details using -r {status|aggregate|shacl|shacllite}. See here for more details.

You can also run:

$ bin/rdfunit -d <dataset-uri> -s <schema1,schema2,schema3,...>

Where you define your own schemas and we pick up from step 3. You can also use prefixes directly (e.g. -s foaf,skos) we can get everything that is defined in LOV.

Using Docker

A Dockerfile is provided to create a Docker image of the CLI of RDFUnit.

To create the Docker image:

$ docker build -t rdfunit .

It is meant to execute a rdfunit command and then shutdown the container. If the output of rdfunit on stdout is not enough or you want to include files in the container, a directory could be mounted via Docker in order to create the output/result there or include files.

Here an example of usage:

$ docker run --rm -it rdfunit -d https://awesome.url/file -r aggregate

This creates a temporary Docker container which runs the command, prints the results on stdout and stops plus removes itself. For further usage of CLI visit https://github.com/AKSW/RDFUnit/wiki/CLI.

Supported Schemas

RDFUnit supports the following types of schemas

  1. OWL (using CWA): We pick the most commons OWL axioms as well as schema.org. (see [1],[2] for details)
  2. SHACL: Full SHACL is almost available except for a few SHACL constructs. Whatever constructs we support can also run directly on SPARQL Endpoints
  3. IBM Resource Shapes: The progress is tracked here but as soon as SHACL becomes stable we will drop support for RS
  4. DSP (Dublin Core Set Profiles): The progress is tracked here but as soon as SHACL becomes stable we will drop support for DSP

Note that you can mix all of these constraints together and RDFUnit will validate the dataset against all of them.

Acknowledgements

The first version of RDFUnit (formely known as Databugger) was developed by AKSW as part of the PhD thesis of Dimitris Kontokostas. A lot of additional work for improvement, requirements & refactoring was performed through the EU funded project ALIGNED. Through the project, a lot of project partners provided feedback and contributed code like e.g. Wolters Kluwers Germany and Semantic Web Company that are also users of RDFUnit.

There are also many code contributors as well as people submitted bug reports or provided constructive feedback.

In addition, RDFUnit used Java profiler (JProfiler) for optimizations