dcmi / usage

DCMI Usage Board - meeting record and decisions
8 stars 7 forks source link

Coining and applying `dcam:rangeIncludes` #43

Open tombaker opened 5 years ago

tombaker commented 5 years ago

Issue 32: "Broken ranges? has generated some great discussion, covering among other things the history of how rdfs:domain and rdfs:range ended up being defined as they currently are. This was also discussed in the June 26 telecon. It looks like we have converged on the following approach:

Note that the definition of dcam:rangeIncludes proposed here is only slightly different from that of rdfs:range ("class of which a valued described by the term is an instance"). For comparison, see the Schema.org definitions [1,2].

Note that the definitions of rangeIncludes and domainIncludes would be included in ISO 15836, but as part of Section 3.1 ("Terms and definitions"), where they would be defined in words only (i.e., no URI), and not in Section 3.3 ("DCMI properties"), which lists properties of the /terms/ namespace with their URIs.

2018-07-20 edit: This posting conflates several issues that I would like to split out into separate questions below.

[1] https://meta.schema.org/rangeIncludes
[2] https://meta.schema.org/domainIncludes

kcoyle commented 5 years ago

Using the DCAM namespace essentially makes this a modification to DCAM. Should we look at what it does to the DCAM model? Should we update the DCAM documentation to include this?

tombaker commented 5 years ago

@kcoyle Instead of starting an entirely new namespace, I suggest we simply rename Section 8 of DCMIMT so that it does not imply that that these are "Terms related to the DCMI Abstract Model", then annotate or even deprecate the DCAM specification itself, though that is a different discussion.

tombaker commented 5 years ago

STRAW POLL ON USING rangeIncludes

Do you agree with changing rdfs:range statements into rangeIncludes statements for properties that are intended to be used with non-literal values? (And rdfs:domain => domainIncludes.)

tombaker commented 5 years ago

STRAW POLL ON LITERAL RANGES

Do you agree with not changing rdfs:range rdfs:Literal statements?

tombaker commented 5 years ago

STRAW POLL ON DEFINITION OF rangeIncludes

Do you agree with the following definitions along the following lines (possibly modified in light of Antoine's suggestion):

tombaker commented 5 years ago

STRAW POLL ON ALTERNATIVE DEFINITION OF rangeIncludes

Do you agree with definitions based on Schema.org:

tombaker commented 5 years ago

STRAW POLL ON DEFINITION OF range

The current ISO draft defines range as:

class of which a value described by the term is an instance

Antoine suggests that rdfs:range actually means something more like:

class of which any object of the statement using the property is an
instance

Is Antoine's definition an improvement?

sruehle commented 5 years ago

rdfs:range with rdfs:Literal statements

As far as I see, the properties used with rdfs:Literal are the date properties and the titles. In both cases I would prefer rangeIncludes, because with rangeIncludes we would recommend to use a literal value but also allow people to use URIs if it is necessary. An example for dates used with URIs are geologic time scales. If I want to say that stones were created in Pleistocene, I would prefare to use a URI. And in some cases (e.g. art history or anthropology) you may need to say more about a title - e.g. the time it was valid, the context it was used etc. If you think that these examples are too sophisticated and out of scope of the dcterms vocabulary, I can live with rdfs:range. But if we want to allow more complex data modelling with dcterms I think rangeIncludes makes more sense.

sruehle commented 5 years ago

Definition of rangeIncludes

I prefer Antoine's definition. So the definition will be "Class of which any object of the statement using the property may be an instance" ?

kcoyle commented 5 years ago

The statements that begin "class of which" are very hard to parse - I have to read them multiple times and still they don't stick. (I think this is an English problem - I can see how it might work ok in other languages.) We need something more like the schema.org statement which is direct, not contorted.

Other suggestions: rangeincludes: The value of this property is expected to be an instance of this class or one of these classes. domainIncludes: Use of this property implies that the subject of the triple is expected to be an instance of this class or one of these classes.

aisaac commented 5 years ago

Maybe I'll have a position that will cut the discussion on "my suggestion" short...

I am still convinced that my suggestion is an improvement over the original ISO one "class of which a value described by the term is an instance", which is not only quite messed up wrt its English, but also wrt its metamodeling parliance.

But the ISO definition shouldn't be the based for rangeIncludes, in my current view. I prefer the Schema.org definition (or @kcoyle's attempts), because it conveys that the property comes with a range expectation. Adapting the ISO definition (with the original wording or mine) fails at this because of the 'may' in it. This is essentially the reason why I'm down-voting the suggestion at https://github.com/dcmi/usage/issues/43#issuecomment-406489656

Note that I'm not sure the Schema.org definition would be my final choice, at this stage. Ideally I would also like that we convey the expectation even when literals are used. This was the essence of the second part of my suggestion for dct:rights at https://github.com/dcmi/usage/issues/30#issuecomment-400371836 I am perhaps naive/idealistic, but if there would be a way to convey this message I think it would be great in terms of implementation guidance.

As for my reluctance to agree with the blanket replacements or confirmations of rdfs:range statements at https://github.com/dcmi/usage/issues/43#issuecomment-406488954 https://github.com/dcmi/usage/issues/43#issuecomment-406489079 This is because I would prefer that we go through all these individual properties and make the decision for each one. It shouldn't take too much time, we would be certain about what we do, and we might get some very valuable insight if there's an issue for a property in that process.

kcoyle commented 5 years ago

Thanks, @aisaac. I'm going to add something here which we can decide to ignore... the use of rangeIncludes/domainIncludes to my mind is making a false equivalence or extension of rdfs:range/domain. In fact, what we are doing is the opposite of RDF: RDF domain determines the class type of a subject in a triple via inference RDF range defines the class type of the value via inference

Neither of these acts as a constraint on the domain or range of a property, and neither is advice to the metadata creator (although we almost always read it that way).

What schema.org does is it uses xIncludes to advise metadata creators on best practices for properties. It also treats those choices as OR rather than UNION. (I may have that wrong, my formal logic is minimal.) This is really unrelated to the semantics of RDF domain and range.

I say this because I think we want to be clear that we are providing something very different from RDF's domain and range. I would like to see the RDF world adopt something other than xIncludes, and I think we should think about this as we do more work on application profiles. I would like to have properties that have the meaning of "suggested value types" "suggested class inclusion" or something like that, and that are clearly differentiated from domain and range. Among other reasons, there may be cases in which you want to define both (dubious? thinking ...)

All of this to say that I do want us to use "suggested" or "expectation" in our definitions because that is the actual meaning of the property. We do not want to imply that any inferencing on these values would produce the kind of semantics that one would expect with domain and range, including their use in query (SPARQL).

sarahartmann commented 5 years ago

All of this to say that I do want us to use "suggested" or "expectation" in our definitions because that is the actual meaning of the property.

+1

kcoyle commented 5 years ago

Relates a property to a class that is an expected type for the value of the property.

kcoyle commented 5 years ago

rangeincludes: The value of this property may be an instance of this class domainIncludes: Use of this property implies that the subject of the triple may be an instance of this class

It is recommended that values of this property be members of this class.

tombaker commented 5 years ago

rangeIncludes: Instances of this class are appropriate as values for the property.

aisaac commented 5 years ago

Trying to keep the notion of expectation in @kcoyle comment above, but fitting to the case of several rangeIncludes: Relates a property to a class that is one of the possible types expected for the value of the property.

osma commented 5 years ago

My rewording of @aisaac's above proposal:

Relates a property to a class whose instances are appropriate values for the property.
aisaac commented 5 years ago

@osma I still would fight for some idea of 'expectation' to be included. And 'appropriate' is maybe too strong. I mean, it sounds ok in the affirmative sense, but if one turns it to the negative form, the sentence then sounds very strong to my hears. I.e., instances of classes that are not in the rangeIncludes could then be said to be 'not appropriate', and that can be interpreted as 'not allowed'. In fact that would be even stronger than what our original rdfs:range statement formally amount to (since rdfs:range is formally not a very strong constraint).

osma commented 5 years ago

@aisaac I don't quite follow...you can't just turn the parts of a statement like this into their negatives and retain the meaning. Saying that "X is an appropriate value type for the property Y" does not entail that "everything not X is not an appropriate value type for Y", just like "Socrates is human" does not entail "everything that is not Socrates is not human".

I'm thinking about this from an Open World Assumption perspective, where each statement represents a part of the truth, but does not preclude other similar statements to be true as well (as long as they don't contradict each other). So each rangeIncludes statement should be true on its own, irrespective of other rangeIncludes statements that may apply to the same property. Maybe the wording I suggested does not convey this in an ideal way, but that's what I was aiming at.

Regarding rdfs:range, I consider it quite strong, despite it not being intended for validation. With "Y rdfs:range X", every value that is used with the property Y can be inferred to be an instance of X. This usually means that there can be only one value for rdfs:range for a particular property. For example, with dct:creator rdfs:range ex:Person, ex:Organization, this would mean that every creator is both a person and an organization. I would like to define rangeIncludes and domainIncludes in a way that permits them having multiple values (as separate, independent statements) without causing nonsensical situations like that.

aisaac commented 5 years ago

@osma I beg to disagree in this specific case. Having a definition reading "Here's a property P. Class C is appropriate for P" is quite different from "Here's Socrates. Socrates is a human". I agree with the open world reading, but intuitively the first sentence is going to cast a big shadow of doubt on the appropriateness of everything that is not C. And intuition matters here, see how the intuition of the wording for rdfs:range mislead many to understand it is stronger than what it is in reality.

osma commented 5 years ago

@aisaac Point taken. I guess we have to find another formulation that cannot be misinterpreted that way. How about:

Relates a property to a class whose instances may be used as values for the property.
aisaac commented 5 years ago

@osma I miss very much the notion of 'expectation' or 'suggestion' in this. 'May' could be anything... How about: Relates a property to one of the suggested classes whose instances may be used as values for the property. This would also capture the 'OR' semantics of several rangeIncludes statements for one property.

kcoyle commented 5 years ago

I like Tom's suggestion: rangeIncludes: Instances of this class are appropriate as values for the property

And we could substitute "suggested" for "appropriate". I agree with Antoine that "appropriate" has the sense that anything else is "inappropriate" which is very close to "not valid" in meaning. Not being suggested just means that it isn't preferred. In fact, we could use "preferred":

rangeIncludes: Instances of this class are preferred as values for the property (Or: Preferred values of this property are ones that are instances of this class.) (But I think Tom's is better, with "preferred" substituted.)

I'm not happy with the "relates" statements. Every property "relates" so that isn't necessary.

tombaker commented 5 years ago
    Relates a property to a class whose instances may be used as
    values for the property.

    Relates a property to one of the suggested classes whose
    instances may be used as values for the property.

    Instances of this class are appropriate as values for the property.

    Instances of this class are preferred as values for the property.

   Relates a property to a class that constitutes (one of) the expected 
   type(s) for values of the property. (Schema.org)

 Instances of this class are suggested as values for the property.

  Instances of this class are one of the [appropriate|preferred|suggested]
  types(s) for values of the property.
tombaker commented 5 years ago

PROPOSED

  Suggested
tombaker commented 5 years ago

PROPOSED

Instances of this class are one of the suggested
types for values of the property.
tombaker commented 5 years ago

PROPOSED

   Values of the property may be instances of this class.
tombaker commented 5 years ago

PROPOSED

   It is suggested that values of this property be instances of this class
tombaker commented 5 years ago

PROPOSED

    Instances of this class are suggested as values for the property.
tombaker commented 5 years ago

PROPOSED

  A suggested class for values of this property.
tombaker commented 5 years ago

PROPOSED

 A suggested class for subjects of this property.
tombaker commented 5 years ago

Resolved!

aisaac commented 5 years ago

This was not closed in fact, and I'm not sure it shall be. Out of the initial points, we have agreed on having domainIncludes and rangeIncludes, we've agreed on a definition for them, but have we agreed to applying them to all properties that have a domain and a non-literal range, without any exception? I have voiced some concerns about the third point and am reiterating them here. But for the sake of progressing on issues and clarifying the discussion maybe we can discuss on a separate issue, which I'm going to create.

tombaker commented 5 years ago

Closed - discussion continues in Issue #59

tombaker commented 6 months ago

@niklasl Am reopening this issue in light of https://github.com/schemaorg/schemaorg/issues/3442, where @hoijui suggests mapping schema:rangeIncludes to dcam:rangeIncludes (and ditto for schema:domainIncludes) via owl:equivalentProperty and @danbri points out that they have subtly different meanings.

kcoyle commented 6 months ago

I commented there, and in fact I don't think that the differences are all that subtle. schema.org defines domains and ranges for its xIncludes properties that are quite specific to schema. DC's properties have no domains or ranges defined, making them wide open in terms of those aspects. In RDF terms, that's really apples and oranges.

As I say there:

(I do think it unwise to adopt vocabulary property names into different vocabularies but with different meanings (even if subtle). The difference in namespace is obvious to machines but we humans tend to jump to conclusions based on human-understandable text.)

hoijui commented 6 months ago

My main motivation is, to aide ontology designers (like me) as much as possible, to figure out which they should use, and secondary - if possible - to connect the two in a machine-readable way - that makes most sense.

As I understand right now (after @kcoyle s explanations), the dcmi versions make sense for pretty much anyone, while the schema versions make sense only for ontologies that use schema as their base ... right? So would it make sense for the schema versions to be defined as owl:subPropertyOf the dcmi versions? Or should they simply be marked as "unequal" in some machine-readable way, and then annotated with an extra comment (for humans), explaining when to use which?

niklasl commented 6 months ago

Yes, this difference is crucial. The DCAM variants are intentionally unconstrained, so they cannot be subproperties of the more constrained Schema.org variants.

It would be great if Schema.org defined those to be rdfs:subPropertyOf the DCAM variants, to aid in ontology interoperability (for both humans and machines).

Process-wise it may be considered "backwards", but such is occasionally the case (and in the future more commonly so, I hope). An "application ontology" defines properties for specific needs. Sometimes those needs would be great to generalize just a tiny bit, defined in a "shared ontology", which the application-centric one can then, as a supportive act of the general case, formally align to. (I believe this is quite like what we hope for with openWEMI, for instance.)

kcoyle commented 6 months ago

Now I have to rebut my own comment because openwemi is using the same class names as FRBR. I still think that will work because FRBR classes are not actually in use. The words, especially "Work", are common and not exclusively property names, like "rangeIncludes". However I still think that thought must be given to how humans AND machines act on defined elements, especially since there is an explosion of terms being defined in the RDF space.

philbarker commented 6 months ago

I commented there, and in fact I don't think that the differences are all that subtle. schema.org defines domains and ranges for its xIncludes properties that are quite specific to schema.

But the schema:Property and schema:Class classes that are suggested domain and range of their ___Includes properties are declared as equivalent to rdfs:Class and rdfs:Property. So

schema:rangeIncludes  
    schema:domainIncludes schema:Property ;
    schema:rangeIncludes schema:Class .

is not particularly limiting. I find it hard to think of uses of the dct: properties that wouldn't honour that suggestion.

That said, I wouldn't mind the superabundance of caution that the rdfs:subPropertyOf provides, because many things that I find hard to think of turn out to exist.

hoijui commented 6 months ago

Both rdfs:range and rdfs:domain already have rdfs:Class as their rdfs:range. I would assume it thus makes sense to have this limitation for their *Includes counterparts as well.

tombaker commented 6 months ago

@hoijui To use a turn of phrase from Alistair Miles, I think of rdfs:range and rdfs:domain as granting a "license to infer" that the object (or subject) of a triple using the given property is an instance of the given range class (or domain class). Defining schema:rangeIncludes and schema:domainIncludes as having an rdfs:range of rdfs:Class would likewise give "license to infer" that the object is an instance of a class.

But what if a URI used as the object of a rangeIncludes assertion were not explicitly defined (or perhaps even intended) by its owner to be a class? Would we want to license that inference anyway?

Put another way, would we really want to implicitly limit rangeIncludes to be used with classes? Could a SKOS concept serve as the object just as well?

I see utility in a notion of rangeIncludes defined as something like a type hint in Python. In Python, a type hint conveys intention in a way that is helpful to a reader of some code or user of an IDE. But while type hints can be enforced by tools, Python itself does not actually block a user from assigning an integer to a variable type-hinted to be a string. I like the idea of having such type hints for RDF too.

kcoyle commented 6 months ago

@tombaker @hoijui I can understand Tom's view that "limiting" to classes (aka URIs) is a limitation, although his argument could also be read as "no problem, because anything can be a class" and "this only matters when doing inferencing."

But I am still bothered by the use of the rangeIncludes/domainIncludes properties, which to me imply a kind of formality related to the RDF concept of range (which does require a class value).
I think it is significant that DCTerms definition for rangeIncludes is:

A suggested class for values of this property.

schema's is:

Relates a property to a class that constitutes (one of) the expected type(s) for values of the property.

The former reads like a note for humans, the latter is written as a formal statement. How does the human-facing definition interact with the RDF formal definition? I think the schema's definition can be read as potentially driving an application, whereas the DCTerms definition reads like a note for humans. It may be subtle, but there is a difference.

Schema is an "application vocabulary" as @niklasl calls it, and a vocabulary that is very self-contained, defining its own elements for Class and Property (as @philbarker shows). The "pseudo-formality" of schema.rangeIncludes and domainIncludes may allow or imply some kind of processing on values for those creating schema metadata.

I see DCTerms as being less prescriptive. If, in most cases, DCTerms is silent on a formal constraint on values, then I would prefer that this be expressed in something that a person would read as a note without the baggage of the schema usage.

Given that most users of these vocabularies will not see the definitions for rangeIncludes and domainIncludes in schema and DCTerms, they would logically assume that they are the same. Because they are not I think that using the same property names is confusing. Actually, that is precisely why we are here.

danbri commented 6 months ago

Given that most users of these vocabularies will not see the definitions for rangeIncludes and domainIncludes in schema and DCTerms, they would logically assume that they are the same. Because they are not I think that using the same property names is confusing.

Yes! This is why I was surprised. The design for schema.org was based on deployment experiences with RDFS and a concern that using the W3C rdfs:range and rdfs:domain definitions as-is would be an engine for generating needless "middleware" types in the (rather large) schema.org type hierarchy. Instead of making a "this is Schema.org alternative version of rdfs:range and rdfs:domain" weird situation we went for documenting a couple of alternative RDFS-like (RDFS-extending) properties. Which was how RDFS was supposed to be extensible since 1998 or so.

I think part of the source of confusion is that there are two views you can take on what the Schema.org datamodel doc says about rangeIncludes and domainIncludes.

You can take the static "schemas as they are right now" view, which is how most RDF/S works when considered formally. Or you can take a more social "we know this stuff is going to be changed over time view", which is how most actual schemas work, especially big ones like schema.org and modest sized ones like FOAF and DC.

The datamodel page https://schema.org/docs/datamodel.html says

  1. each property may have one or more types as its domains. The property may be used for instances of any of these types.
  2. each property may have one or more types as its ranges. The value(s) of the property should be instances of at least one of these types.

This gives the impression you have the "license to infer", when in fact a more temporal perspective would caution "so long as the schemas don't get changed...".

Imagine it is Tuesday and an imagined mediaFoo property has BarMedia and BlahMedia as its expected types, ... (i.e. mediaFoo is a Property, and we would say both "schema:mediaFoo schema:rangeIncludes BarMedia" and also schema:mediaFoo schema:rangeIncludes BlahMedia".

And so if we find some data on Tuesday using Tuesday's schemas would allow you to conclude "well it ought to be a BarMedia or a BlahMedia or both" whenever you see some value for the mediaFoo property.

And if someone writes software on Wednesday assuming that schemas never change, they might bake assumptions into code along those lines.

But what happens on Wednesday when XYZMedia is added to the imaginary schema.org terminology in this area, and also added to the definition for schema:mediaFoo?

Thursday's real world data could use that and if we on Friday see data like item_123414 schema:mediaFoo item_61712271 what can we conclude? That it's either going to be a BarMedia or a BlahMedia or an XYZMedia (or there's a bug, or that the schemas have evolved even further, or that someone is using terminology in an unusual way. Doing so btw is not a crime! Tom has a great paper on this - https://www.dl.slis.tsukuba.ac.jp/ISDL97/proceedings/thomas/thomas.html - and it fits with our experience at Schema.org. People try out new idioms and report back and sometimes they fizzle out, and sometimes they get added to the documentation - "paving the cowpath".

Framed in these terms the two flavours of rangeIncludes and domainIncludes may be closer than it seems.

The schema.org documentation emphasises the conclusions you could draw if you knew you weren't going to be messed with by schema changes. The DC version seems a lot scruffier but by using the exact same property names (and RDFS-style naming) probably isn't as far from RDF/S's underlying types-and-instances underlying datamodel as we might think. People will recognize their distinctive names and probably assume they basically do the same thing, even if they don't.

p.s. this is not a criticism, these things happen - https://schema.org/title vs dc:title etc. and there are many other name clashes between popular namespaces. FWIW 13 years ago I started mapping out these clashes, see https://github.com/danbri/Zoo/blob/master/zoo.foaf.tv/zoo/master.txt https://github.com/danbri/Zoo/blob/master/zoo.foaf.tv/zoo/zoo_manifest.txt

On Mon, 15 Jan 2024 at 17:16, Karen Coyle @.***> wrote:

@tombaker https://github.com/tombaker @hoijui https://github.com/hoijui I can understand Tom's view that "limiting" to classes (aka URIs) is a limitation, although his argument could also be read as "no problem, because anything can be a class" and "this only matters when doing inferencing."

But I am still bothered by the use of the rangeIncludes/domainIncludes properties, which to me imply a kind of formality related to the RDF concept of range (which does require a class value). I think it is significant that DCTerms definition for rangeIncludes is:

A suggested class for values of this property.

schema's is:

Relates a property to a class that constitutes (one of) the expected type(s) for values of the property.

The former reads like a note for humans, the latter is written as a formal statement. How does the human-facing definition interact with the RDF formal definition? I think the schema's definition can be read as potentially driving an application, whereas the DCTerms definition reads like a note for humans. It may be subtle, but there is a difference.

Schema is an "application vocabulary" as @niklasl https://github.com/niklasl calls it, and a vocabulary that is very self-contained, defining its own elements for Class and Property (as @philbarker https://github.com/philbarker shows). The "pseudo-formality" of schema.rangeIncludes and domainIncludes may allow or imply some kind of processing on values for those creating schema metadata.

I see DCTerms as being less prescriptive. If, in most cases, DCTerms is silent on a formal constraint on values, then I would prefer that this be expressed in something that a person would read as a note without the baggage of the schema usage.

Given that most users of these vocabularies will not see the definitions for rangeIncludes and domainIncludes in schema and DCTerms, they would logically assume that they are the same. Because they are not I think that using the same property names is confusing. Actually, that is precisely why we are here.

— Reply to this email directly, view it on GitHub https://github.com/dcmi/usage/issues/43#issuecomment-1892542823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGOJOKE3RW44WUHLNK3YOVQAJAVCNFSM4FK3DQJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBZGI2TIMRYGIZQ . You are receiving this because you were mentioned.Message ID: @.***>

hoijui commented 6 months ago

Uff... I am a software dev, and in my mind RDF, ontologies are like DB schemas for distributed data. I know that is not true, it is not what they are meant for, nor what they are formally and so on. It is just what I want to have a solution for in the end of the day, when walking away from this. ... The world is much simpler in my mind because of this.

Is it even an option, to change the names of the two DC properties (as I understood, that some of you think that might be a good way to go)?

tombaker commented 5 months ago

@hoijui As a point of procedure: to follow our namespace policy, we would not change the names of the two DC properties. However, we could in principle create new properties under a different name, use the new properties instead of the existing properties, and mark the existing properties as deprecated.