diachron / quality

Dataset Quality Assessment (part of WP5 of the Diachron EU FP7 project)
MIT License
8 stars 4 forks source link

WhitespaceInAnnotation metric #33

Closed clange closed 10 years ago

clange commented 10 years ago

Identify annotations (using the same properties as in #32) whose objects have leading or trailing whitespace (use the regular expression \s), e.g.

<http://...> <http://www.w3.org/2000/01/rdf-schema#comment> " this is new   " .

WhitespaceInAnnotation is a metric in the category of Representational dimensions; Understandability dimension.

The metric value is defined as the ratio of annotations with whitespace to all annotations (i.e. all triples having such properties).

Some of the implementation may be shared with #32.

muhammadaliqasmi commented 10 years ago

WhitespaceInAnnotation metric implementation consider widely used annotation properties like labels, comments, notes, etc and its identifies triples whose property is from a pre-configured list of annotation properties, and whose object value has leading or ending whitespace in string. The list of widely used annotation properties are stored in ..src/main/resources/AnnotationPropertiesList.txt

Metric value = total number of whitespace literals / total number of literals Metric value Range = [0 - 1], Best Case = 0, Worst Case = 1

--implemented in issue#33 branch --issue#33 branch merged with master branch