dcmi / dctap

DC Tabular Application Profile
https://dcmi.github.io/dctap/
33 stars 10 forks source link

rename "pattern" valueConstraintType #60

Closed philbarker closed 2 years ago

philbarker commented 2 years ago

The recommended value pattern for valueConstraintType is too imprecise to be useful across different implemetations (unless it is assumed to be a regex), and is awkward when we want to describe what shapes are in terms of being patterns for data.

We could replace pattern with regex in the recommended values for valueConstraintType .

This would have wide reaching consequences for all our documentation and examples.

philbarker commented 2 years ago

Sub issue: we may want to note somewhere that, if more precision is required, implementers may define valueConstraintTypes for different flavours of regex, e.g. perl-regex, xquery-regex etc [1] -- it would be really nice if we could recommend an extension mechanism that would keep some hope of compatibility by graceful degradation, e.g. a format regex.flavour might allow tools to act on the "regex" part even if they don't understand the .flavour part.

  1. https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines#Language_features
kcoyle commented 2 years ago

Some other uses of "pattern" for this:

XML schema:

pattern. " pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to literals which match a specific pattern. The value of pattern ·must· be a ·regular expression·. " <xs:pattern value='123 (\d+\s)*456'/>

JSON schema:

"The pattern keyword is used to restrict a string to a particular regular expression.": { "type": "string", "pattern": "^(\([0-9]{3}\))?[0-9]{3}-[0-9]{4}$" }

shex

Shex uses the XML schema pattern constraint. "valueExpr": { "type": "NodeConstraint", "pattern": "^/\t\\\uD835\uDCB8\?$"

SHACL

SHACL. "4.4.3 sh:pattern - sh:pattern specifies a regular expression that each value node matches to satisfy the condition. "

philbarker commented 2 years ago

Thanks @kcoyle. Is the message that "pattern" is widely used as a synonym for regex and so there is no problem? I would be happy to accept that and avoid a change that would require extensive rewrites.

kcoyle commented 2 years ago

@philbarker I think that's what we have to contemplate, but I'm not sure that clinches it. Thinking about our audience, it seems to me that a non-coder metadata manager would probably not be adding a regex to a TAP; that would be left to a more technical member of the development team. So the question is: what would best speak to both of these persons? Or is the regex function something that non-techie members would probably ignore or take on faith? I'm waivering between pattern and regex. I'll be putting this on the agenda for this week.

philbarker commented 2 years ago

In our call of Jan 6 we decided to keep pattern as it is widely used to include regexes, but also allows for other options.