dcmi / usage

DCMI Usage Board - meeting record and decisions
8 stars 5 forks source link

Use "URI" or "non-literal value" in comments? #41

Closed jneubert closed 6 years ago

jneubert commented 6 years ago

To me, it seems that both are used interchangebly, so we should harmonize this.

osma commented 6 years ago

In principle, in the RDF context "non-literal value" includes blank nodes as well as URIs/IRIs. But in practice bnode values seem to be extremely rare, so this may not matter much.

kcoyle commented 6 years ago

Unfortunately, BIBFRAME is full of bnodes, so they aren't as rare as they should be. :-(

This brings up the question of which terminology we are using. DCAM or RDF? Whichever it is, we need to provide the context for the terms we use. I prefer RDF because more people understand that than understand (or even know about) DCAM.

osma commented 6 years ago

Yes, I was referring to the value type statistics which count the usage of IRI vs bnode vs literal values for DCTerms properties. hasFormat is the only property which is used with blank nodes a significant number of times (1.2M occurrences in the Openlink data set).

Oh dear, DCAM vs RDF is another can of worms. Here's how DCAM defines non-literal values etc.:

Each value is a resource - the physical, digital or conceptual entity or literal that is associated with a property when a property-value pair is used to describe a resource. Therefore, each value is either a literal value or a non-literal value:

  • A literal value is a value which is a literal (as defined by RDF [RDF]).
  • A non-literal value is a value which is not a literal value.

I think using "non-literal value" consistently would be slightly better than talking about URIs or IRIs, since it follows DCAM and is a little bit more abstract than RDF. After all DC is used outside the RDF world a lot. I don't particularly like the expression itself though - I wish we could talk about entities or objects, but I'm not proposing to switch the terminology.

jneubert commented 6 years ago

If we have to stick to "non-literal value", we should not direct anybody who seeks practical guidance on what that means to DCAM. The definition "not a literal (as defined by RDF)", with a link to the RDF Spec, is plainly deterrent.

Is there instead a place where we could, within DCMI Metadata Terms, state something like:

non-literal value: Recommended practice is to use an URI or IRI.

If deemed necessary, that could be supplemented by:

(For formal background, please refer to 
[DCAM](http://dublincore.org/documents/2007/04/02/abstract-model/))
osma commented 6 years ago

To open yet another can of worms, should we use "IRI" instead of "URI"? I'm pretty ambivalent on this, but recent standards such as SPARQL 1.1 use IRI consistently.

I hate it that just when the use of "URI" was starting to become accepted, the SW/LD community came up with "IRI" to replace it. I wish that instead of coining yet another term, the definition of a URI would have been adjusted to take into account i18n issues. There is no "IRL" or "IHTML" :)

kcoyle commented 6 years ago

I vaguely recall that I suggested IRI at one point and it would mean changing all of the instances of URI, which was considered untenable. However, URI looks really weird to me after all of this time using IRI.

osma commented 6 years ago

@kcoyle I think we mostly use "non-literal value" in current DCTerms comments? I see that "non-literal value" appears 15 times in the DCTerms document, while URI is only used in a technical sense, not in the comments. Since we have to (if we decide to) change all mentions of "non-literal value" anyway, we could just as easily use either "URI" or "IRI". Of course, that might still mean changing all the mentions of URI in the technical sense, but I would think that would be rather easy with the new publishing infrastructure that is based on templates (and a global search-and-replace on a bunch of static files is not very difficult anyway).

juhahakala commented 6 years ago

I am strongly opposed to the usage of IRIs in DC documents. IRI specification (https://www.ietf.org/rfc/rfc3987.txt) is out of date, and IETF established IRI working group (https://datatracker.ietf.org/wg/iri/about/) which tried to update it. After 15 Internet drafts the WG had to give up since there were just too many problems which could not be solved. The latest RFC3987bis draft can be found from https://tools.ietf.org/html/draft-ietf-iri-3987bis-13. Chapter 8 of that draft gives some idea of additional challenges IRIs pose, as compared with URIs. So URIs should not be replaced by IRIs unless there are very good practical reasons to do that.

Although SPARQL specification (https://www.w3.org/TR/sparql11-query/) speaks about IRIs, all the IRIs I was able to find from the document were actually URIs, and the specification does not specify how to process IRIs if they were actually used.

juhahakala commented 6 years ago

The ISO standard already defined the term literal. I extended its definition slightly and added term non-literal value.

3.1.3 literal string of Unicode characters, typically letters or integers, combined with an optional language tag

3.1.4 non-literal value in RDF context, either a blank node or URI.

Note 1 to entry: In Dublin core context, non-literal value is a URI.

kcoyle commented 6 years ago

Isn't there also an optional datatype? e.g. for dates, number types, etc.

tombaker commented 6 years ago

Like @osma, I consider it an unhelpful distraction to worry about what term to use. We already had this discussion in relation to DCAM in 2007 - eleven years ago. I agree with @kcoyle that we should not cite DCAM as a source of definitions.

Juha has added 3.1.3 and 3.1.4 to the section "Terms, definitions and abbreviated terms". The solution above looks good enough for the purposes of the ISO draft, especially if there are actually good reasons to avoid "IRI".

tombaker commented 6 years ago

@jneubert Would you agree to closing this issue for now?

tombaker commented 6 years ago

@kcoyle you are right (FWIW, DCAM also mentions datatype). Better:

string of Unicode characters, typically letters or integers, 
combined with an optional language tag or datatype
tombaker commented 6 years ago

Closing for now. @jneubert we can re-open if you think further discussion is needed for now.

tombaker commented 6 years ago

Reopening

tombaker commented 6 years ago

If nobody objects, will close in August

As noted above, Juha has put the following into the July 12 draft:

3.1.3
literal
string of Unicode characters, typically letters or integers, 
combined with an optional language tag

3.1.4
non-literal value
in RDF context, either a blank node or URI.

Note 1 to entry: In Dublin core context, non-literal value is a URI.

These definitions seem good enough for our purposes. 2018-07-19: with the addition of "or datatype", so:

literal
string of Unicode characters, typically letters or integers, 
combined with an optional language tag or datatype
kcoyle commented 6 years ago

I object if datatype is not also given for literal: "optional language tag or datatype"

tombaker commented 6 years ago

@kcoyle You are right - have added to the proposal - can you upvote now?

tombaker commented 6 years ago

APPROVED:

3.1.3 
literal
string of Unicode characters, typically letters or integers, 
combined with an optional language tag or datatype

3.1.4
non-literal value
in RDF context, either a blank node or URI.

Note 1 to entry: In Dublin core context, non-literal value is a URI.