DOREMUS-ANR / doremus-ontology

Describing music catalogs
http://data.doremus.org/ontology/
Creative Commons Attribution 4.0 International
19 stars 8 forks source link

Shouldn't some Object properties be Data properties? #21

Closed pierrechoffe closed 6 years ago

pierrechoffe commented 7 years ago

Whenever their range can only be a literal or a number :

U10 has order number U24 has award (no controlled vocabulary attached) U41 has catalogue number U42 has opus number U43 has opus subnumber

caveat : U41, 42, 43 can have alphanumerical values, e.g. KV 93a, opus 27c, opus 27/c

rtroncy commented 7 years ago

Yes, all properties that have for value literals (in the RDF sense) should be of type owl:DataProperty (and not owl:ObjectProperty). This seems the case for all the properties you cite above.

pierrechoffe commented 7 years ago

What about the meaning behind the value? e.g. regarding the order number, I guess it is ok to say that track nr. 1 in a recording doesn't have any particular meaning : M24_Track _u10_has_ordernumber "1" In this case, "1" just means a position in an ordered list.

But when it comes to a musical piece, e.g. Beethoven's Symphony nr.3, the number is an identifier, just as a name : the symphony is known as "Eroica" or as "Symphony nr.3" (among other appellations), therefore "3" is not just a number, it is an appellation. To make things even more complicated, the same musical piece can have various numbers, depending on the musicologists, e.g. the famous "Unfinished Symphony" by Schubert can be either nr. 8 (most common) or nr. 7. This is the reason why _mus:u10_has_ordernumber is presently a subproperty of _ecrm:p1_is_identifiedby.

So, should we have distinct "order number" properties for these distinct cases ?

rtroncy commented 7 years ago

You love to re-open issues that have been closed :-)

What about the meaning behind the value? e.g. regarding the order number, I guess it is ok to say that track nr. 1 in a recording doesn't have any particular meaning : M24_Track u10_has_order_number "1" In this case, "1" just means a position in an ordered list.

A property has always a meaning. In this particular case, we will not create such a property since we say we will use the built-in RDF list construct to number the tracks in a CD, see issue #16.

But when it comes to a musical piece, e.g. Beethoven's Symphony nr.3, the number is an identifier, just as a name : the symphony is known as "Eroica" or as "Symphony nr.3" (among other appellations), therefore "3" is not just a number, it is an appellation.

That's why we say "Literal" in the RDF world. A literal is interpreted as a string. This is the meaning you seem to want to give.

So, should we have distinct "order number" properties for these distinct cases ?

The examples you give do not talk about the same thing, so yes, we need different properties for stating different things. Order of tracks in a CD, or order of performance in a concert, or order of lines in a program, or ... can be handled with rdf:List. If you want to bring a different notion of order, then, please, define what you mean, show how it is represented in current MARC records, and let's work out which properties are needed to be created or re-used.

pierrechoffe commented 7 years ago

You love to re-open issues that have been closed :-)

Hey, I mentioned the ordered list precisely to say that in this case we should use the built-in RDF list, so the idea was not to re-open the issue :smile:

What I wanted to point out was the fact that, contrary to what I thought when I opened this issue (thx to Jean's owl2csv tool btw), the situation is more nuanced.

So far, I would say :

U10 has order number has a meaning of identifier, we should only use it for F2_Expression (F22, F24, M43, ...)

_U24 hasaward is definitely a literal (but see issue #22 for more on this)

U41, U42, U43 are components of a catalogue or opus statement, they are literals. The identifier is the combination of the catalogue name and the number.

pierrechoffe commented 7 years ago

You love to re-open issues that have been closed :-)

To be more explicit on this point, my question is also : how do I model this rdf-list implementation in our schemas. We should tell our people how to map their data with the model. Or should we say that whenever there is a track number this will be dealt with at the implementation level ?

delahousse commented 7 years ago

Hello,

Also from my experience from ELI the ontology for legal notices based on FRBR: after a lot of discussions, we decided to double some properties as data properties and object properties with a specific suffix to differentiate both.

We did this for the properties where we find institutions using controlled vocabularies for a property, and other organisations with no controlled vocabulary for the same property. It's make it easier to adopt the standard for institutions which have a lot of text values in their notices and still want to publish their data. The same idea exists for Schema.org, unlike schema.org we decided to double the properties to have a clean owl ontology.

Jean

rtroncy commented 7 years ago

@pierrechoffe I'm glad you're asking for examples since this is what I'm asking in nearly all single messages I'm writing :-) So, if you give me a concrete MARC record and optionally the corresponding DOREMUS schema, I can show you how it will be serialized in turtle following our model.

Or should we say that whenever there is a track number this will be dealt with at the implementation level ?

No, you can represent this in the model, using node and edges labeled rdf:List, rdf:first, rdf:rest, etc. (see the full specification or alternatively this excellent tutorial).

To come back to "what I understand" is your issue:

I suggest you clarify your terminology when you say "has a meaning of identifier" ... an identifier is a string. Sometimes, you meant a reference to an identifier (in XML, an idref), which is an object.

@delahousse Duplicating properties that can have both a literal value OR a URI (e.g. pointing to a skos:Concept) is a hammer for me. Its only interest is to be kept in the OWL DL world ... but who cares? Having a single property that have both a literal AND a URI as value is possible in RDF, and thus in OWL Full. Actually, this was the decision made by schema.org ... not duplicating properties because they don't care about OWL DL specifically.

pierrechoffe commented 7 years ago

I suggest you clarify your terminology when you say "has a meaning of identifier"

Attached below is an Intermarc record of Beethoven's Symphony nr 3 "Eroica". The number can be found in 144$n and the nickname in 444. As it happens, the number is "No 3", so they write the wole string, including the abbreviation "No".

Coming back to your question : a value is a value, so the meaning is in the property, am I right? Number 3 is just number 3, it can be the order number of a symphony or the number of yogurts left in my fridge. The property gives it a meaning. The property _mus:u10_has_ordernumber is a subproperty of _p1_is_identifiedby, which implies a certain meaning (if I understand it right). This would not be the case with a DataType Property which would not inherit from any other CRM/FRBRoo property.

Said diferently, inside the conceptual frame of CRM/FRBRoo, a meaning may be (more or less) clear. Concretely, I create a subproperty of _p1_is_identifiedby and things are meaningful (of course I have to specify things in the scope note), I know that my data is a "Symbolic Object" in the sense of CRM. But when I create a property as DataType, I have no superproperty to cling to (or RDF properties? but how do I connect them together?) and I have a feeling that the coherence is lost. But maybe I am lost :))

My conclusion is that making U10 a DataType Property will make it impossible to understand what "No 3" means in the case of the symphony. It would just be a number in an ordered list, whereas in reality it is the name that the composer (or an editor or a musicologist) gave it.

I hope this is clearer - although I must admit this is still not absolutely clear to me :)

liste de notices complètes 28_09_2016 17_51.pdf

rtroncy commented 7 years ago

Attached below is an Intermarc record of Beethoven's Symphony nr 3 "Eroica". The number can be found in 144$n and the nickname in 444. As it happens, the number is "No 3", so they write the wole string, including the abbreviation "No".

Is it a recurrent pattern? I.e. will the value of 144$n always be "No digit" or not? If you say yes, then we might interpret the value of 144$n as a number, while if the value varies in the form, we would be better at interpreting it as a string.

Coming back to your question : a value is a value, so the meaning is in the property, am I right?

Not quite. A value has a type. 3 interpreted as a string "3", or as an integer 3 or as a double 3.0 is not the same thing although the lexical representation of the value will always be 3. The machine will do very different things with it.

The property mus:u10_has_order_number is a subproperty of p1_is_identified_by, which implies a certain meaning (if I understand it right).

This is an interesting example. What do you mean by a certain meaning? By adding this subProperty axiom, you just further constraining the domain and range of this property which means that this property can do a number of additional inferences.

This would not be the case with a DataType Property which would not inherit from any other CRM/FRBRoo property.

Why not? It all depends on how you would define this datatype property, but I'm not sure where you want to go. FRBRoo provides a useful framework to further refine in the context of DOREMUS, in particular, there is a large number of patterns we can directly re-use or specialize. This does not mean that all classes and properties from DOREMUS should be linked to a FRBRoo entity.

But when I create a property as DataType, I have no superproperty to cling to (or RDF properties? but how do I connect them together?) and I have a feeling that the coherence is lost. But maybe I am lost :))

It seems so. You do not have to cling to any property. This is done by the virtue of the RDF metamodel. Your problem is that you have a top-down approach for modeling things. You feel you need to start with some existing top concepts or properties to cling new entities. This is a fallacy. Think "bottom up" and just define what you need. All you worries will go. To clarify: if you need a new DOREMUS mus:Uxx property, just create it either by mus:Uxx a owl:DataProperty OR mus:Uxx a owl:ObjectProperty ... add a rdfs:label, notes, descriptions, etc. If you want to add a subPropertyOf axiom to an existing DOREMUS or FRBRoo property, you can ... but you don't have to. Is it clear?

My conclusion is that making U10 a DataType Property will make it impossible to understand what "No 3" means in the case of the symphony. It would just be a number in an ordered list, whereas in reality it is the name that the composer (or an editor or a musicologist) gave it.

No, no, no. Why are you talking about number? "No 3" is a string composed of the 4 characters 'N', 'o', ' ' and '3' that has to be interpreted as such. You can write in the the description, usage notes, etc. attached to the property definition how this string should be interpreted. The fact that you type this property as an owl:DataProperty just means that the value is a literal. If you would have type U10 has a owl:ObjectProperty, then, the machine would expect that the value to be a URI pointing to a resource (an object if you prefer).

delahousse commented 7 years ago

sur les propriétés data property et object property si on peut faire cela proprement, c'est ok. Dans le cadre d'ELI et pour la version OWL on n'est pas arrivé à une solution satisfaisante pour déclarer les ranges des deux aspects de la propriétés, mais je serais content de voir la solution. Peut etre ne voulions nous pas avoir un OWL Full.

Jean

pierrechoffe commented 7 years ago

Think "bottom up" and just define what you need. All you worries will go.

OK thanks! Now, with great power comes great responsibility, so watch out :smile:

pierrechoffe commented 7 years ago

Is it a recurrent pattern? I.e. will the value of 144$n always be "No digit" or not? If you say yes, then we might interpret the value of 144$n as a number, while if the value varies in the form, we would be better at interpreting it as a string.

Yes it is a recurrent pattern at BnF. The rule is "type in 'No', then space, then number". So this is a string but we can easily extract the number and make it an integer.

rtroncy commented 7 years ago

So this is a string but we can easily extract the number and make it an integer.

Yes, we can. However, the decision criteria is not what we can do, but the meaning we want to give for this value. In this case, you're arguing that this value is not really a number but a name that the composer (or an editor or a musicologist) wants to give.

pierrechoffe commented 7 years ago

@rtroncy Can we discuss this point in Montpellier? So far M10_Order_Number is a subclass of E41_Appellation, based on a discussion with Patrick. But now, I would argue the appellation is "Symphonie No 3" , not "3". The number is just a number (i'm not talking about formats), the appellation is a combination of concepts (in this case, the Genre + Order Number). In this case, we have a list of symphonies, in which ours happens to be the third. So the appellation "symphony No 3" will be one of the titles and "3" is just an information for whoever would like to make requests based on it. It may be a handy way to list Beethoven's symphonies also. But let us talk further.

pierrechoffe commented 7 years ago

Having a single property that have both a literal AND a URI as value is possible in RDF

@rtroncy Our MOPs are to be found in controlled vocabularies, therefore _mus:u1_has_medium_ofperformance is an Object Property, but what happens if we have a mop so rare that it is not in our cv ? You said this is possible to have both a literal and a URI value, how do we do this ? Practically speaking, how do I treat such properties (that can have both types) in Protégé : as data properties or object properties ?

I have edited the Googlesheet with the list of properties and their associated property type, this will give you a good idea of where we are now and the questions still pending.

rtroncy commented 7 years ago

@rtroncy Can we discuss this point in Montpellier?

Sure ! If I understand your reasoning, you want to further structure the value of the E41_Appelation into a genre and a number and you're confident that this will be a systematic pattern worth to implement? I have yet to be convinced of what will be the added value.

You said this is possible to have both a literal and a URI value, how do we do this ?

I will try to not make a tautological answer but you just do it. On a concrete example this means:

<http://data.doremus.org/performance/1234>
    mus:U1_has_medium_of_performance <http://iflastandards.info/ns/unimarc/terms/mop/kfp>
<http://data.doremus.org/performance/5678>
    mus:U1_has_medium_of_performance "An unknown instrument"@en

The practical consequence is that the mus:U1_used_medium_of_performance should not be declared as an owl:ObjectProperty but as a rdf:Property. The relationship with the property ecrm:P125 might be unnecessary.

Practically speaking, how do I treat such properties (that can have both types) in Protégé : as data properties or object properties ?

None. As rdf:Property.

Your Google spreadsheet needs to be reworked. Avoid to have as possible value a string and a date. Make a choice or create two distinct properties. Dates refer to a mathematical construction in which you have a TRS (temporal reference system) and mathematical operators even if dates can be fuzzy (e.g. circa.). On the contrary, named period (e.g. Renaissance) are human construction that are culture dependent. In this particular case, it makes sense to have two properties for differentiating the two.

Regarding the general case where we expect a SKOS concept as value of a property, yes, we cannot guarantee that the scheme will be exhaustive, but enabling literal as value is a false solution. An alternative is to simply have a placeholder 'other' in the scheme for those values.

delahousse commented 7 years ago

pour répondre au retour de Pierre "Je crois qu’on a dit cela à propos de M10_Catalogue_Name car en effet cela renvoit au référentiel des catalogues. Par contre je ne crois pas qu’on ait répondu cela à propos de M11_Catalogue_Number car le “numéro” de catalogue est et ne peut être qu’une chaîne de caractères alphanumériques."


I need a list of the classes we should keep as range for object properties, and a list of classes of the conceptual model we should not declare in the owl model as the property using them as range in the conceptual model will be declared as data property in the owl.

I would report this in the owl ontology.

@pierrechoffe : could you do this list ? or could we have a conf call to do it together ?

rtroncy commented 7 years ago

After reading this thread again, I made numerous changes to fix this issue, see the commits https://github.com/DOREMUS-ANR/doremus-ontology/commit/ab75272477986275f26004d39330bf32105ac3bf, https://github.com/DOREMUS-ANR/doremus-ontology/commit/f69d2b257f0144571d4acd6f8f268b873dcceefb, https://github.com/DOREMUS-ANR/doremus-ontology/commit/f471b48eabb355aa4a03125f8effd9a61d948847, https://github.com/DOREMUS-ANR/doremus-ontology/commit/e052ed2d7cf7c25e970cad45626d3f8a8797b607, https://github.com/DOREMUS-ANR/doremus-ontology/commit/ddc9b0ba1eb37d9eab533df9cd8e13ebfc75791f, https://github.com/DOREMUS-ANR/doremus-ontology/commit/3c9eff5cb1969b9b0b79a38604d96e1118b7f323 and https://github.com/DOREMUS-ANR/doremus-ontology/commit/6bd02e30c9367e6d062947f11877123e0f04d60c

pasqLisena commented 6 years ago

This issue is fixed now (probably after @rtroncy 's commit)