Let's think about a declarative language for quality metrics.
I.e. that large parts of the implementation of a new metric would be implemented in the form of a dataset that's an instance of the daQ vocabulary.
In pseudo code e.g. a declarative representation of the UndefinedClassesOrProperties metric could look like this:
IF TRIPLE MATCHES ?s rdf:type|rdfs:subClassOf|rdfs:domain|rdfs:range ?c
# ^^^ This would be a SPARQL graph pattern
THEN CHECK
# Here we could use a SPARQL FILTER expression:
(dqf:DereferenceableAsLOD(?c)
|| dqf:ExistsLocallyInThisDataset(?c)
|| dqf:OtherwiseKnownToUs(?c))
&& dqf:QuerySucceeds(?c a owl:Class)
# ^^^ once more a SPARQL graph pattern
# Actually this check is more complex
# but I'll leave it like this for now for the example
Complex operators like DereferenceableAsLOD or ExistsLocallyInThisDataset or QuerySucceeds would be realised as custom SPARQL functions with a Java implementation, reusing code from methods we already have. (I used dqf for our custom namespace of “data quality functions”.)
This language could include elements for generating problem reports, which we need for cleaning. (@jerdeb @nfriesen please edit this into "quality report" if that's the correct term)
Let's think about a declarative language for quality metrics.
I.e. that large parts of the implementation of a new metric would be implemented in the form of a dataset that's an instance of the daQ vocabulary.
In pseudo code e.g. a declarative representation of the UndefinedClassesOrProperties metric could look like this:
Complex operators like DereferenceableAsLOD or ExistsLocallyInThisDataset or QuerySucceeds would be realised as custom SPARQL functions with a Java implementation, reusing code from methods we already have. (I used
dqf
for our custom namespace of “data quality functions”.)Compare page 7 of http://svn.aksw.org/papers/2013/ISWC_LODStats/public.pdf. They get by without complex operators, but their task is simpler than ours.
This language could include elements for generating problem reports, which we need for cleaning. (@jerdeb @nfriesen please edit this into "quality report" if that's the correct term)