Interoperable-data / ERA_vocabulary

ERA vocabulary is an ontology defined by the European Union Agency for Railways (ERA) to describe the concepts and relationships related to the European railway infrastructure and the vehicles authorized to operate over it.
https://data-interop.era.europa.eu/era-vocabulary/
MIT License
4 stars 3 forks source link

some shapes use unanchored patterns #25

Open VladimirAlexiev opened 2 months ago

VladimirAlexiev commented 2 months ago

(Related to #21)

Some ERA shapes use sh:pattern with an unanchored regex, eg "foo". But:

https://w3c.github.io/data-shapes/shacl/#PatternConstraintComponent refers to https://www.w3.org/TR/sparql11-query/#func-regex which refers to https://www.w3.org/TR/xpath-functions/#func-matches which says

Unless the metacharacters ^ and $ are used as anchors, the string is considered to match the pattern if any substring matches the pattern.

You can check the same at https://shacl.org/playground/

This means an unanchored pattern doesn't check the whole string. So you always want to use an anchored regex like "^foo$".

Count of such problems from search in github:

I guess that public/doc/era-shapes.ttl is a concatenation of all individual shape files. Here's a grep of such problems in that file, which confirms the number 4+3:

grep -n 'sh:pattern' era-shapes.ttl |grep -Pv '"\^.*?\$"'
130:    sh:pattern "[1-9]\\d{0,5}|0" ;
146:    sh:pattern "[1-9]\\d{0,3}|0" ;
162:    sh:pattern "[1-9]\\d{0,2}|0" ; #TODO check pattern, it is defined as double
1714:   sh:pattern "([1-9]\\d{3}|[1-9]\\d{2}|[1-9]\\d{1}|[0-9])" ;
1732:   sh:pattern "([1-9]\\d{1}|[0-9])\\.[0-9]" ;
1749:   #sh:pattern "([1-9]\\d{2}|[1-9]\\d{1}|[0-9])" ;
1896:   sh:pattern "([1-9]\\d{2}|[1-9]\\d{1}|[0-9])" ;
ednaru commented 2 months ago

Thanks. We will add these indications to the ticket already created for issue #21