Closed kcoyle closed 3 years ago
Someone (Ben? John?) suggested that we distinguish constraint_type
and constraint_value
. Without making such a distinction, it is impossible (or at any rate risky) to guess what the examples above really mean:
GT 13
- could conceivably be interpreted as a literal with a space, a regular expression, or a pick list of two literals: GT
and 13
.>1900
- could be a literal, regular expression, or mathematical express ("greater than 1900")Consider: value_type | constraint_value | constraint_type | annotation |
---|---|---|---|
Literal | GT 13 | ||
Literal | GT 13 | Regex | requires all three |
Literal | GT 13 | LiteralPicklist | |
LiteralPicklist | GT 13 |
The literal pick list could be handled with two columns if "value type" were not limited to URI, Literal, Non-Literal, and BNode, but the Regex could not.
The literal pick list could be handled with two columns if "value type" were not limited to URI, Literal, Non-Literal, and BNode, but the Regex could not.
If we wish to express a value type--we may roughly equate this with a datatype--in the column value_type
, then it doesn't make sense for something like LiteralPicklist
to appear there, as values in the instance data will not be literal picklists.
Option 1: One column, assess values by pattern
In this option, value constraints are identified uing their characteristics:
Assume that we would choose one way to code each possibility
Advantages:
Disadvantages:
Some questons:
Option 2: One column for constraint type, separate column for the pattern
In this option each constraint will have a type associated with it.
type | example | |
---|---|---|
pick list | "red" "blue" "green" | |
formula | <13 >2 | |
uristem | https://id.loc.gov/subjects | |
regex | /[.*+-?^${}() | []\]/g, '\$&'); // $& |
The "constraint type" could be expressed as actions, such as: "select one of" for "pick list"; "beginning with" for "uristem". I can't immediately think of more, but we could probably come up with them.
Advantages:
Disadvantages:
I'm in the Option 2 camp at this time, because I think it offers more clarity*.
But the disadvantages that @kcoyle points out are worth consideration!
You could easily imagine how this could make a simple model significantly more complex.
*To be clear, what I'm thinking of at the current time is something like: | value_type | constraint_type | constraint_value |
---|---|---|---|
URI | URI stem | https://id.loc.gov/subjects | |
literal | pick list | 'red' 'blue' 'green' |
etc.
It seems to me that Option 1 may not avoid the need to define constraint types, and problem of their proliferation; since it displaces it with a need to define syntaxes (and circulate those definitions). That makes Option 2 more appealing to me.
This is now being discussed at https://github.com/dcmi/dctap/issues/5, which links back to here.
We have said that value constraints are additional rules to be applied to the value datatype. We have also considered that the value constraint could be a regex, ShExC (or ShExJ) code. Other constraints could be simple "formulas" like
Can we define rules for value constraints?
Keep in mind that one role for profiles is to convey the information about instance metadata to "foreign" users so ideally anyone should be able to understand the constraints without having inside information.