dcmi / dcap

DC Tabular Application Profile - supporting materials
28 stars 12 forks source link

Defining value constraints #63

Closed kcoyle closed 3 years ago

kcoyle commented 3 years ago

We have said that value constraints are additional rules to be applied to the value datatype. We have also considered that the value constraint could be a regex, ShExC (or ShExJ) code. Other constraints could be simple "formulas" like

Can we define rules for value constraints?

Keep in mind that one role for profiles is to convey the information about instance metadata to "foreign" users so ideally anyone should be able to understand the constraints without having inside information.

tombaker commented 3 years ago

Someone (Ben? John?) suggested that we distinguish constraint_type and constraint_value. Without making such a distinction, it is impossible (or at any rate risky) to guess what the examples above really mean:

Consider: value_type constraint_value constraint_type annotation
Literal GT 13
Literal GT 13 Regex requires all three
Literal GT 13 LiteralPicklist
LiteralPicklist GT 13

The literal pick list could be handled with two columns if "value type" were not limited to URI, Literal, Non-Literal, and BNode, but the Regex could not.

briesenberg07 commented 3 years ago

The literal pick list could be handled with two columns if "value type" were not limited to URI, Literal, Non-Literal, and BNode, but the Regex could not.

If we wish to express a value type--we may roughly equate this with a datatype--in the column value_type, then it doesn't make sense for something like LiteralPicklist to appear there, as values in the instance data will not be literal picklists.

kcoyle commented 3 years ago

Option 1: One column, assess values by pattern

In this option, value constraints are identified uing their characteristics:

Assume that we would choose one way to code each possibility

Advantages:

  1. one column
  2. users can learn patterns but don't have to know how to name them

Disadvantages:

  1. could be complicated to code for correctly
  2. will be difficult to detect badly expressed patterns vs. just a complicated pattern

Some questons:

  1. which regex?!
  2. are there other types of constraints that we need to add here, e.g. language tag lists
kcoyle commented 3 years ago

Option 2: One column for constraint type, separate column for the pattern

In this option each constraint will have a type associated with it.

type example
pick list "red" "blue" "green"
formula <13 >2
uristem https://id.loc.gov/subjects
regex /[.*+-?^${}() []\]/g, '\$&'); // $&

The "constraint type" could be expressed as actions, such as: "select one of" for "pick list"; "beginning with" for "uristem". I can't immediately think of more, but we could probably come up with them.

Advantages:

  1. clear designation of type of constraint
  2. easier to determine if constraint is expressed in a valid way
  3. profile creators can specify constraints that their applications understand

Disadvantages:

  1. We have to determine a specific set of constraints that all applications need to understand
  2. The number of types of constraints may grow very large in actual usage
briesenberg07 commented 3 years ago

I'm in the Option 2 camp at this time, because I think it offers more clarity*.

But the disadvantages that @kcoyle points out are worth consideration!

You could easily imagine how this could make a simple model significantly more complex.

*To be clear, what I'm thinking of at the current time is something like: value_type constraint_type constraint_value
URI URI stem https://id.loc.gov/subjects
literal pick list 'red' 'blue' 'green'

etc.

johnhuck commented 3 years ago

It seems to me that Option 1 may not avoid the need to define constraint types, and problem of their proliferation; since it displaces it with a need to define syntaxes (and circulate those definitions). That makes Option 2 more appealing to me.

kcoyle commented 3 years ago

This is now being discussed at https://github.com/dcmi/dctap/issues/5, which links back to here.