Closed kcoyle closed 3 years ago
Question: How many columns are needed? That is, do we need to designate separate columns for
valueType type - the type of the value type, such as if xsd:date
the type of the value type would be URL, whereas the expected instance data value is a formatted date ("2020-06-02")
pick lists - do these need their own column?
concept schemes, aka URI stems - there is no standard RDF value type for URI stems, although this does exist in both SHACL and ShEx; does this need its own column? Or can we designate sx:URIstem as a valueType even though that is not standard?
node type - RDF has nodeKind:("iri" | "bnode" | "nonliteral" | "literal"
. Are these needed in the template, and if so how are they to be used?
(Note: feel free to add other questions below - these are the ones from my meeting notes.)
Where does a statement that the value should meet a definition of an entity shape (or one from a list of entity shapes) defined in the AP fit?
(My initial thought was that these constraints on entity-like values are similar to constraints on literal values such as xsd:date
, and so could go in the same column.)
I don't understand how one would define an entity shape other than what we have in the table. Can you give an example of what you are looking for?
yes, the entity shape is defined in the table.
We have examples that refer to entity shapes as constraints on values in the Value Space column, e.g. in bookclub
ID | URI | Label | Type | Value Space |
---|---|---|---|---|
sdo:author | author | URI | @author |
|
wdt:P127 | owner | URI | @owner |
When processing the AP, such a constraint needs different treatment to a pick list of literal or URI values (I think). That doesn't necessarily mean we need a different column, but it does require consideration.
OK, I think I get what you mean. The Type = URI would mean that the type of the value in instance data is a URI, thus @author would need to be a URI (or a bnode). Maybe what we need to do is play with some pseudo code or simply natural language statements of what these types and values mean to clarify our thinking. I also think that we should create some instance data examples to make this all more concrete, maybe even working back from some plausible instance data to a profile that would define it.
Better late than never, I am chiming in after this group has been running for some time now. The background being that I picked up reading about APs and this group when preparing a talk about an LRMI profile we are developing for German-speaking implementors. See the slides to the talk (in German) at http://slides.lobid.org/kim-ws-2020/. I am also contributing to the LRMI task group that is chaired by @philbarker.
I also think that we should create some instance data examples to make this all more concrete, maybe even working back from some plausible instance data to a profile that would define it.
I recommend to also gather examples that are a bit off and thus invalid so that you have to make sure the AP catches them. For the above mentioned profile, which is currently embodied in a JSON(-LD) schema, we gather valid and invalid examples and the schema is automatically tested against those with every commit (see the test.sh which is executed by Travis). This setup (which we copied from https://github.com/reconciliation-api/specs/commit/6b5985df4e37bd45bb50dd2dcda80b9c7014f561) makes it easy to iteratively develop the schema and make it more and more restrictive/verbose. Whenever I encounter/think of an invalid example that is not catched by the profile, I add it to the invalid
folder and subsequently adjust the schema so that it catches the error.
There are now instance examples for bookclub (data.ttl) and recipe (guided_recipe.json).
Neither is meant to illustrate anything specific, they were just what I had to hand. Both should conform to the relevant AP.
The recipe example is interesting in that, as is typical of schema.org instances for Google, it's bnodes from top to bottom.
Might some use case based requirements also help? E.g. "I need to know whether something is a BNode or a URI because ...." "If you give me the information that something is a Literal and that its datatype is xsd:date separately it will allow me to ...."
In terms of Bnode v URI, I think that happens this way:
BNODE
Instance data with BNODE: using bookclub.csv
ex:book1 a sdo:book ; sdo:name "Moby Dick" ; sdo:author _:author1 .
_:author1 a sdo:Person ; sdo:givenName "Herman" ; sdo:familyName "Melville" .
URI
Instance data with a URI, using this profile [Tom: corrected format]:
ID | URI | label | M | O | VT | V |
---|---|---|---|---|---|---|
@book | Book | y | y | |||
rdf:type | instance of | y | n | URI | sdo:Book | must be schema.org/Book |
rdf:type | instance of | y | n | URI | wd:Q571 | must be wikidata Book |
sdo:name | title | y | n | Literal | xsd:string | |
sdo:author | author | y | y | URIstem | http://viaf.org | must be a VIAF URI |
Instance:
ex:book1 a sdo:book ; sdo:name "Moby Dick" ; sdo:author http://viaf.org/viaf/27068555/ .
In natural language, the internal links using "@" are BNODES in RDF; if the value is to be a URI then you either do:
the internal links using "@" are BNODES in RDF
oof, that's a big extra assumption, and is not in line the example that I provided e.g.:
book:002 a sdo:Book, wd:Q571 ;
sdo:name "The Comedians" ;
sdo:author author:001 ;
wdt:P127 member:002 . # owned by
author:001 a sdo:Person ;
sdo:givenName "Graham" ;
sdo:familyName "Greene" .
I would think it is a quite common case that you would want to maintain your own data for more than one entity type, and would want to allow people to submit data as graphs covering all the relevant entities.
Phil, in your case, then "author:" is a prefix that has to be defined in your prefix declarations as standing in for some URI, right? Which would mean that author:001
is an entity that has been defined with a URI prior to the creation of the instance data. Or are you saying that the local data mints URIs "on the fly" for entities?
Yup, author:
is in the prefixes. The URI might be minted as part of the submission process (A workflow such as: when entering data, the user enters the name, the system checks whether the author with this name exists, if not it provides a new URI for the author)
Another case is where I want to use data from a service (something like wikidata) but need to check that it is sufficiently complete in order to decide whether to use it as is or to supplement it.
Sorry, but the assertion that "@"-referenced node is always a BNode seems sudden and arbitrary.
Last night I realized that my example is actually perfect for why a separate entity link may be needed. And it's a real use case.
The library world has URIs for agents of various types, but they are all under a single URI scheme. If you want only one type of agent, say "Person", you have to look beyond the URI to the data associated with that URI. However, the "link" of a URI stem in the profile could be ambiguous. Here's an attempt to map this [Tom: corrected format]:
ID | URI | label | M | O | VT | V |
---|---|---|---|---|---|---|
@book | Book | y | y | |||
rdf:type | instance of | y | n | URI | sdo:Book | must be schema.org/Book |
rdf:type | instance of | y | n | URI | wd:Q571 | must be wikidata Book |
sdo:name | title | y | n | Literal | xsd:string | |
sdo:author | author | y | y | URIstem | http://viaf.org | must be a VIAF URI |
http://viaf.org | ||||||
rdf:type | instance of | Y | n | URI | bf:Person | Must be a bf Person class |
What this says to me is that the URI stem used as a value might not work when linking entities within the profile. If your value is a URI stem, how can you reuse that within the profile? Should you? Can anyone think of a way to work around this? Also, should entities be exclusively "@" names? Are there instances where a URI could be used for an entity name?
@philbarker What you are describing to me is a URI and you show it as a URI (author:001), not an "@" entity. As a URI it seems fine. The "@" entities, as I have seen them, are internal and we haven't developed a way to associate them with URIs. It seems that if you are making use of URIs then you have a defined URI scheme in your prefix declaration. Your example above does not use "@" notation. Where do you see "@" notation fitting in to your example? If it represents a URI, how would that work?
@kcoyle That example above is instance data, I don't see why there would be any @
notation in the instance data?
the example shows the value for sdo:author
is provided by a URI that indentifies an entity, the description for which is provided and conforms to the constraints defined by the AP entity shape @author
.
I see the following two things as fundamentally different (even "disjoint"):
Taking @philbarker 's example:
ID | URI | Label | Type | Value Space |
---|---|---|---|---|
sdo:author | author | URI | @author |
|
wdt:P127 | owner | URI | @owner |
One might read this as meaning that we should expect to find an entity shape ID as a value in the instance data, whereas I think the intent is to say two quite different things:
sdo:author
must be a URI that identifies that author, and @author
. In other words, the value (a URI) associated with sdo:author
does not actually identify the construct in the profile that we are calling an entity or entity shape. Until now I had pictured that we might say: "Type: Entity
, Value space: @author
", as in the Wikidata "painting" example, but the example above makes it clear that this will not do. I hesitate to propose an extra column but think this distinction could be made alot more cleanly in something like in the following:
Entity Shape ID | URI | Label | Type | Value Space | Entity Shape Ref |
---|---|---|---|---|---|
@book |
|||||
sdo:author | author | URI | @author |
||
wdt:P127 | owner | URI | @owner |
||
@author |
|||||
foaf:name | name | Literal |
which I intend to mean:
sdo:author
statement is a URI (which identifies an author), and the fact that its Value Space is empty means that the URI is not further constrained (for example, to a specific URI or to a URI stem such as http://viaf.org
).sdo:author
statement is described using a set of properties and constraints as specified by the entity shape @author
.@kcoyle I took the liberty of fixing the format in two of your examples above (the mandatory and repeatable columns did not align). If I correctly understand what you intend the book example to mean, one might express it in the template as follows:
ESID | URI | Label | Type | Value Space | ESRef | Comment |
---|---|---|---|---|---|---|
@book |
||||||
sdo:author |
author | URIStem | http://viaf.org |
@author |
||
@author |
||||||
rdf:type |
is instance of | URI | bf:Person |
Must be a BIBFRAME person |
I think @tombaker and I are saying the same thing.
I would go one step further, and suggest that saying a literal used as a value must conform to xsd:string
is very similar to saying that entity [description] used as a value must conform to @author
. I would (& have in my example) put them in the same column. I'm happy for a value space to be defined by conformance to a standard/spec/profile but if that's not what you have in mind maybe it's a distinct column.
I would (& have in my example) put them in the same column.
I too believe that it make sense to put it in the same column unless there is a use case where filling out both columns makes sense. @tombaker provides one in https://github.com/dcmi/dcap/issues/61#issuecomment-640521364 but I am not sure whether this is necessary.
Thinking about this, I realize I see two – somehow related – problems in the current profile examples:
URI
as type for nodes that can also be blank nodesRe. 1.) I would rather like to see something like node
as value which basically means something like: "another entity for which at least one statement is included in the data". Together with 2.), the result could look something like this (where I use @id
for defining the subject URI and make it optional so that it basically means: bnodes are ok but if you provide a URI it should be from VIAF):
ID | URI | Label | Mandatory | Repeatable | Type | Value Space | Comment |
---|---|---|---|---|---|---|---|
sdo: | http://schema.org/ | schema.org | |||||
xsd: | http://www.w3.org/2001/XMLSchema# | XML Schema | |||||
rdf: | http://www.w3.org/1999/02/22-rdf-syntax-ns# | RDF | |||||
@book |
Book | ||||||
sdo:author | author | y | n | Node | @author |
||
@author |
Author | ||||||
@id |
URI | n | n | URIStem | http://viaf.org | ||
rdf:type | instance of | y | n | URI | sdo:Person sdo:Organization | must be person or organization | |
sdo:givenName | given name | y | n | Literal | xsd:string | ||
sdo:familyName | family name | n | n | Literal | xsd:string |
@acka47
Thinking about this, I realize I see two – somehow related – problems in the current profile examples:
- using URI as type for nodes that can also be blank nodes
That's why I like the node kinds as mentioned by Karen (who I think meant "ShEx", not "RDF", which has just three, the fourth in ShEx, "non-literal" meaning, in effect, "iri or bnode").
the example shows the value for sdo:author is provided by a URI that indentifies an entity, the description for which is provided and conforms to the constraints defined by the AP entity shape @author.
Yes, sorry, I got the wrong meta level! But I do wonder what the advantage is to using an "@" node in the profile rather than the entity URI, since one exists. This gets us back to what our value column represents, which I think we need to hash out. I'll try to create some examples.
(I also sent this as an email to the list, in slightly different formatting)
Tom and I had an impromptu brainstorming session this morning, based on the github comment that I made where I realized that the use of URIstems was problematic for our table structure. Tom responded with the thought that we might be using the value space for significantly different things. (Tom, speak up if I mis-characterize your view!) Adrian also weighed in on this. During our chat Tom and I fiddled around with a google spreadsheett trying out various ideas. We did NOT solve the problem but at least agreed on what we thought was unresolved. Here are some things you may notice about the spreadsheet (which I based on a section of Phil's example):
Problems remain in this table. My main concern is that it might seem odd to profile developers to have the key information about the author (line 6) separate from the author entity information on lines 8 and 9. We tried to come up with a way to have an author entity that has all of the author information, somehow moving the value type of author entity and the constraint to the @author entity node of the table. What we came up with was an unattractive kludge. If you have ideas on how to solve this, please speak up and/or copy what is hear and make the modifications that you think will work.
kc
Here's a nearly-readable screen shot of the table:
@kcoyle Excellent summary! Naming issues aside, I think this iteration makes three key improvements:
One other detail: Instead of using xsd:string
to mean "string", it seems just a bit more user-friendly to call it simply String
and rely on the template conversion script to map this to xsd:string
if so desired.
@acka47
- not being able to define a URI stem for the subject URI (or do I oversee something?) but only for object URIs/nodes
I agree that this cannot be expressed in any of the variants of the CSV model we are discussing, but in general, I think we should not try to overload the model with too many features and, specifically in this case, I think the omission is justified.
In the current model, constraints on nodes can only be expressed for nodes in the object position, but to support the specification of a URI stem for the subject URI, one would need to add another two columns to the model:
Subject Type
(because one of those types would need to be URI stem
) and Subject constraints
(which is where one would put http://viaf.org
). But would the gain in expressivity be worth the loss of simplicity? If the CSV model were so complex, why not just learn ShEx (which distinguishes between node constraints and triple constraints)?
The "unattractive kludge" to which Karen alludes above would be to use, say, dc:identifier
to put the subject URI into the object position, where it could be constrained, as in the triplehttp://viaf.org/1234 dc:identifier http://viaf.org/1234
. This is not really a solution...
, I think we should not try to overload the model with too many features
I agree with keeping it simple.
specifically in this case, I think the omission is justified.
I am not convinced. Don't you think there will be use cases where people want to add constraints on the URI of the top-level node (which in the example is @book
)? Maybe there won't be and I argue for something people don't need but I am not sure about this.
to support the specification of a URI stem for the subject URI, one would need to add another two columns to the model
I am not sure it has to be this way. Maybe we could find a way to put in in the current model In https://github.com/dcmi/dcap/issues/61#issuecomment-640572492, I used the @id
key – as JSON-LD does it – for the subject URI. In the book example, this could look like this:
Subject Shape | Property | Display Label | Mandatory | Repeatable | Value Type | Value Constraints | Value Shape ID | Prefix | Namespace |
---|---|---|---|---|---|---|---|---|---|
@book | Book | ||||||||
@id | y | n | URIStem | http://openlibrary.org/ | |||||
rdf:type | instance of | y | n | URI | sdo:Book | ||||
rdf:type | instance of | y | n | URI | wd:Q571 | ||||
sdo:name | title | y | n | String | |||||
sdo:author | author | y | y | URIStem | http://viaf.org | @author |
This could easily mean, the URI for an instance node of @book
must be in the http://openlibrary.org/
namespace. Note that I renamed the column "property" instead of "propertyURI" to take this case into account. This leads to another question I have (sorry I have not followed the whole process until now): Is DCAP aimed at RDF data only, i.e. at data that identifies keys/properties with a URI? Or does it also cover non-RDF like plain csv or JSON?
@acka47 That's an interesting twist! In this case, I find it confusing that @id
and @book
both use the @
prefix with different meanings, but I'm sure there are ways around that.
We did decide awhile ago to "develop an RDF-specific model, back it up with a variety of example profiles, each with some instance data that we can validate with ShEx. Once that is solid, we can go back and see if the same template can work with other data types, like XML, JSON".
My first reaction is that treating the identifier as a property, as you suggest above, should fit nicely into the model (apart from the question of punctuation) without compromising the use the CSV model for RDF-based profiles. Is there anything else in the way you use JSON-LD that is not already supported by the CSV model and that you think should be?
If the table structure makes the use of URIstems difficult, fine: don't do string hacks on URIs, treat them as opaque identifiers with no internal semantics,--but I guess the boat has sailed on that
I still don't understand why the emphasis is on APs being about properties. I know that a focus on properties is in the DCMI heritage, but an Application Profile mixes and matches existing Classes, Properties and value encoding schemes, and I believe we should treat them equally.
I think the table should primarily identify which existing vocabularies are being used and which terms from these vocabularies are being used. So it should include namespaces, with local and global identifiers; classes being used, with local and global identifiers; properties being used, with local and global identifiers; and encoding schemes (concept schemes, syntax encoding schemes), with local and global identifiers. (The local identifiers are only needed if we want to make cross references within the AP.)
Is there anything else in the way you use JSON-LD that is not already supported by the CSV model and that you think should be?
Directly to my mind come the following two:
"@container: "@list"
, in RDF rdf:List
) for a specific property? Re. 2.) you have to distinguish – from a JSON-LD view – a simple array ("@container": "@set"
) from an ordered list ("@container: "@list"
). Re. an array, it is often important for a JSON representation to know which properties can generally have many values (=array) and it often makes sense to then coerce those values to an array even if only one value exists. I can already derive this behaviour from repeatable: yes
but do not see a solution for the ordered list.)
define property value as ordered list: In which way do I define usage of an ordered list (in JSON-LD
"@container: "@list"
, in RDFrdf:List
) for a specific property?
Thinking about this, you can easily define a value shape @list
with rdf:type
rdf:List
, so that does not seem a general problem. If I want to define the nodes that are items of this list it becomes more complex as I have to add another value shape. Example:
Subject Shape | Property | Display Label | Mand. | Rep. | Value Type | Value Constraints | Value Shape ID | Prefix | Namespace |
---|---|---|---|---|---|---|---|---|---|
@book | Book | ||||||||
rdf:type | instance of | y | n | URI | sdo:Book | ||||
ex:chapters | chapters | n | n | @list |
|||||
@chapterList |
chapter list | ||||||||
rdf:first | y | n | @chapter |
||||||
rdf:rest | n | n | @chapterList |
||||||
@chapter |
Chapter | ||||||||
rdf:type | y | n | URI | sdo:Chapter |
So it might make sense to add some syntactic sugar for ordered lists.
@philbarker
think the table ... should include namespaces, with local and global identifiers;
Check.
classes being used, with local and global identifiers;
Check (I think), as in the example above: rdf:type sdo:Chapter
. Not that rdf:type
is the only way to express class membership, but I cannot think of any way to do this that could not in principle be accommodated in the model, especially if we were to relax the requirement that properties be URIs, as per Adrian's example from JSON-LD.
properties being used, with local and global identifiers;
Check.
and encoding schemes (concept schemes, syntax encoding schemes),
Taking the two separately:
xsd:string
, be accommodated as Value Types? Perhaps the user guide could encourage the use of String
or Date
but I see no obvious reason not to use any arbitrary datatype URI as a value type. Use of a datatype URI from the xsd
namespace would of course imply that the value is a literal; I'm not sure if it would be safe to assume that any URI used as a Value Type would be a datatype.I am assuming that the more common way to use a concept scheme in modern metadata would be simply to use the URI of a concept as value URI. I dunno - maybe there could be a Value Type like, say, Value URI Source
, defined as a pointer to a list or set of value URIs? If a concept scheme consists of concepts that share a base URI, such as http://www.fao.org/aims/aos/agrovoc/
, then URIStem
could be used. (However, concept schemes do not necessarily have just one base URI.)
@acka47 The shape IDs in rows 6 and 7 were in the "prefix" column (and mandatory/repeatable were not aligned) so I edited your post to move them under Value Shape ID.
So it might make sense to add some syntactic sugar for ordered lists.
Maybe so. Is this a common use case?
@acka47
constraining literals by regular expression: I guess that you have it already covered but could not find it quickly
Good question - I'm not sure we have covered those. Maybe Regex Match
could be a value type, the value constraints of which would be the regex?
Is this a common use case?
With regard to bibliographic data, the use cases are contributors (e.g. bf:contribution
) and subjects (e.g. mads:componentList
). At least, that's where we use RDF lists in lobid.
@philbarker I would love to find another solution to URI stems! One that doesn't disrupt our model. I would appreciate hearing more detail around your:
don't do string hacks on URIs, treat them as opaque identifiers with no internal semantics
To that end, here is an expression of the use cases.
My data has a property that will take as its object a member of an external vocabulary. The external vocabulary members have URIs. Any member is acceptable/valid.
My data has a property that will take as its object a member of an external vocabulary. This member must meet certain criteria to be valid - specifically, it must be itself be a member of a specific class as defined in that vocabulary (e.g. a SKOS concept)
My data has a property that will take as its object ANY URI that is a member of a SKOS concept scheme.
The third one is an add-on that we haven't discussed but that I know is in use in at least one metadata application. If it doesn't fit with solutions to 1 and 2 we can discuss it another time.
I'll also note a possibly dangerous thought that came up in earlier discussions, which is to allow the value column to include regex-type formulas. This is not something that we would expect from our most beginner profile developers, but might provide a passage from the simplest template to one that can express useful value rules like "date = > than 2000 but < than 2021". URI stems could be "ex:www.something.org*". But then we'd have to have a way to indicate that the value field contains a formula, rather like the "SUM=" in spreadsheets.
Thanks!
I added a second sheet to the google spreadsheet to show the "solution" that gathers all of the author information in a single "shape". I do like the idea of saying that PropertyX has as its object ShapeY and all of the value constraints are in the ShapeY rather than giving value constraints on the PropertyX row. (That'll be clearer when you look at the spreadsheet.) This solution is really unrelated to the URIstem problem, and you can see it in the @series shape as well. In short, this suggests:
A property either has a value as its object, or it has a shape as its object, but not both.
@kcoyle I think for those requirements it is better to resolve the identifier and check for statements like <whatever> rdf:type skos:Concept
<whatever> skos:inScheme http://example.org/requiredVocab
I think you fit that in the model defining the required properties in an @entityShape
and referring to it.
@philbarker Is this what you mean?
SSID | Prop | VType | VConstraints | Value ShapeID |
---|---|---|---|---|
@book | dc:subject | URI | @subject | |
@subject | rdf:type | URI | skos:Concept | |
skos:inScheme | URI | http://ex... |
@kcoyle This would perhaps address:
- My data has a property that will take as its object ANY URI that is a member of a SKOS concept scheme.
@kcoyle
- My data has a property that will take as its object a member of an external vocabulary. The external vocabulary members have URIs. Any member is acceptable/valid.
How about ValueTypeSource
as a value type (defined as a pointer to a list or set of value URIs)? I'm not convinced by my own suggestion but do not see any obvious problem with it.
@kcoyle @acka47 To summarize, I'm seeing two possible modeling patterns for recording the identifier of the subject described by a given shape, both of which put the URI in the object position:
dc:identifier
and constrains it to be based on a URI stem. This is very readable and easy to understand, though it does not actually constrain the subject URI of the triples about (in this case) the author.SSID | Prop | VType | VConstraints | Value ShapeID |
---|---|---|---|---|
@thing | @id | URIStem | http://... | |
dc:identifier | URI | http://ex... |
Aside from the unfortunate use of @
with two meanings, both patterns seem valid, and with known limitations: the first does not constrain the subject URI, while the second relies on a JSON-LD interpretation.
I would love to find another solution to URI stems! One that doesn't disrupt our model
Joining both the discussions on Regex and URIStems together: I think one could completely get along without the URIStem
value type and replace it with a regex. I added a third table to the spreadsheet that defines a constraint on a URI by regular expression. Here is the relevant snippet.
Subject Shape ID | PropertyURI | Display Label | Mandatory | Repeatable | Value Type | Value Constraints | Value Shape ID | Prefix | Namespace | ||
---|---|---|---|---|---|---|---|---|---|---|---|
@author | Author | ||||||||||
rdf:type | instance of | y | n | URI | bf:c_Person | ||||||
@id |
author | y | y | URI | regex(^http:\/\/viaf.org\/viaf\/[1-9]\d{0,21}) |
Aside from the unfortunate use of @ with two meanings, both patterns seem valid, and with known limitations: the first does not constrain the subject URI, while the second relies on a JSON-LD interpretation.
I completely understand that you don't want to use @
s in two different senses. I used @id
because it is known from JSON-LD and I did not see a better choice. I don't se it as a JSON-LD keyword in this context, though, but as a specific DCAP keyword (like all the @
terms in JSON-LD are JSON-LD-specific keywords, all the @
s in DCAP could be DCAP-specific).
I think it makes sense to use such a specific keyword that means "using this keyword means all statements are made about the subject URI", but I don't mind using another token for the keyword – as long as it is clearly distinguished from the RDF properties. That's where I see the problem with dct:identifier
as it can not be used and interpreted like a specific DCAP keyword.
@acka47
I think it makes sense to use such a specific keyword that means "using this keyword means all statements are made about the subject URI", but I don't mind using another token for the keyword – as long as it is clearly distinguished from the RDF properties.
Interesting idea! So it could be something like SubjectID
-- no prefix or http://
, and perhaps uppercased?
Other than this keyword, and aside from a controlled "starter vocabulary" of Value Types (which we clearly need), can we think of other cases where such a keyword might provide some syntactic sugar for edge cases? I wouldn't want to see us get too fancy with special keywords, but I'd be curious if we do see any.
For example, how strong is the requirement for a profile (or its shapes) to be able to reference themselves (e.g., InProfile
, analogously to skos:inScheme
or rdfs:isDefinedBy
?).
@tombaker yes, something like that
@philbarker you asked:
"I need to know whether something is a BNode or a URI because ...." "If you give me the information that something is a Literal and that its datatype is xsd:date separately it will allow me to ...."
and the answer is a resounding "yes!" - those use cases would be very helpful. Thanks.
@acka47
constraining literals by regular expression: I guess that you have it already covered but could not find it quickly
Adrian, we talked about this early on, before we decided to work first on the very simplest of cases. In that early thinking this would indeed be in the value space, and it could be fed pretty directly into a ShEx schema. We could "allow" this in our simple schema and give a few simple examples of common needs like constraining dates or numbers. I'm thinking that we may want to include features like this as extensions of the simpleAP model, not as part of the very simple base. And I am still hoping that we can go beyond the simplest model, at least as extensions and examples of more complex cases. Anyone else have comments on this?
@kcoyle I fully agree that we need to keep the model simple but think we have some wiggle room between saying that the Value Constraints
column (naming issue aside) only holds things that are actually "in the value space", in a strict sense, as opposed to also holding things that meaningfully constrain the set of possible values with respect to a given Value Type
in a looser and more flexible sense.
To follow the former (stricter) interpretation would be to limit ourselves to things such as:
Value Type | Value Constraint |
---|---|
URI |
sdo:Person |
Literal |
"confidential" |
because the URI sdo:Person
(or http://schema.org/Person
) and the string "confidential"
are expected to appear in the instance data.
However, if we were to say that the nature of the value constraints is specific to a given value type, then we could accommodate things like:
Value Type | Value Constraint |
---|---|
URIStem |
http://schema.org |
TypedLiteral |
xsd:string |
URIRegex |
regex(^http:\/\/viaf.org\/viaf\/[1-9]\d{0,21}) |
LiteralPicklist |
["animal" "vegetable" "mineral"] |
URIPicklist |
[http://purl.org/example http://schema.org] |
In our planned "starter" vocabulary of value types, then, we would need to clarify, for each value type, the nature of expected value constraints (eg, actual URIs and literals in the former examples; base URIs, datatype URIs, and regular expressions in the latter examples, along with any formatting rules such as "enclose lists with square brackets" or "enclose regular expressions in parentheses").
We would need to make a number of somewhat arbitrary decisions such details (e.g., can any value type be turned into a picklist by enclosing the set of alternative value constraints in square brackets?). And I still see no elegant way to accommodate more complex expressions such as "URI or Literal lastname
". Creeping featurism is a slippery slope, but such a model could accommodate quite a few of the list cases we have discussed.
@tombaker
a picklist by enclosing the set of alternative value constraints in square brackets
oh, I don't think that helps. I don't need the square brakets to tell me it is a picklist if the Value Type does that, and I don't want the square brackets if I am processing the Value Constraint string in python because without them I can just use str.split() on the value.
@philbarker
oh, I don't think that helps. I don't need the square brakets to tell me it is a picklist if the Value Type does that, and I don't want the square brackets if I am processing the Value Constraint string in python because without them I can just use str.split() on the value.
It's fine with me to separate multiple items in a Value Constraints
cell with just whitespace. I guess that would preclude putting multiple regexes into a cell (because a regex might have spaces), and the strings of a literal picklist would also break if there were spaces, but perhaps those are small prices to pay for the simpler approach.
Taken up in https://github.com/dcmi/dctap/issues/5, which links to here for discussion
One of the key things that we have to decide is how to define the rules or constraints for values that will be applied to instance data created according to the profile. Our current template has:
value type: Akin to rdf:type, this designates the general data type expected for the value, such as
xsd:date
orxsd:anyURI
value: This column contains further constraints on the value itself. An example could be a pick list of literal values ("red" "blue" "green"). If there are no more specific constraints on values beyond the value type, this column is left blank.
Some questions follow.