chin-rcip / collections-model

Linked Open Data Development at the Canadian Heritage Information Network - Développement en données ouvertes et liées au Réseau canadien d'information sur le patrimoine
Creative Commons Zero v1.0 Universal
12 stars 1 forks source link

Property Chains #30

Open stephenhart8 opened 4 years ago

stephenhart8 commented 4 years ago

In the second version of OWL, a new feature allows the creation of property chains.

A property chain is the declaration that a chain of property is the subProperty of another one, as in the following example:

P14.a o P14.b SubPropertyOf P14 Property_Chain_Example-2

In other words, we imply with this property chain that if we have the chain of property P14a followed by P14b: InstanceA->P14a->InstanceB->P14b->InstanceC then it is the same that if we have juste the property P14: InstanceA->P14->InstanceC

This Property Chain has multiple advantages:

  1. First, even if in the dataset, we only have the triples : InstanceA->P14a->InstanceB->P14b->InstanceC, someone requesting the triple InstanceA->P14->InstanceC will still have an answer, without the need to know the more complex path. For example, if we documented the production of Soleil de Minuit by Jean Paul Riopelle with the more complexe PC14 Carried out by and the P14.1 in the role of: CiC_Issue30-example Someone searching the triplet: <mic.ca/uri/production/01> P14_carried_out_by x? will still have the answer <mic.ca/uri/actor/1234>
  2. Second, this kind of property chain will allows us to create some new data. For example, we could have the property chain :has_father o :has_brother SubProperty :hasUncle that would imply that if Person A has as father Person B, and that this Person B has as brother Person C, then Person A has as Uncle Person C, like in the following example:

CiC_issue30-example2

Nonetheless, there is a few disadventages with property chains:

  1. You need to have a triple store and a SPARQL endpoint that can handle those kind of reasoning. GraphDB can though.
  2. The use of property chain needs the creation of a lot of new home-made properties (the Px.a and Px.b, as the P01 and P02 of the extension of CIDOC is not appropriate for property chains). Until now we have been against creating new properties in order to be more interoperable.
  3. Finally, for technical reasons, with the use of property chains, it is difficult to have quantification on those properties (to say that this property can be linked to 0 to 1 specific class).
Habennin commented 4 years ago

Hi all,

Just some notes on this interesting proposition.

1) It sounds perfect for solving the PC issue. I would explore further. 2) why are P01 and P02 not appropriate? You can't tell the property chain to have the same predicates but in a different combination? Because there will always be a different PC class in the middle. I am just asking to understand better. 3) There is NO reason to be against making new properties or classes for the sake of interoperability. If you make a new class or property that is declared following the logic of CIDOC CRM (ie it does not break the class hierarchy's intention, you don't put fruits in meat baskets as it were) and you link your class or property as a sub-property of a CRM class or property then you are 100% compatible with CIDOC CRM and the manual even encourages you to do this.

KarineLeonardBrouillet commented 4 years ago

Notes on verbal meeting 2020-02-17

Why were p1 & p2 not right?

Flutifioc. The interest of having specific properties is to be able to automate reasoning and just have to check the properties themselves to know what they are a chain of. Both work but the problem of using 1-2 is the same as using reification which is a problem for several reasons but mostly the fact that you are saying something about a triple that may or may not exist. With the use of a-b the fact that the properties appear is the triple so it ensures that the triple exists, which is safer.

Habennin. It would be interesting to know if anyone else considers using this.

Flutifioc. 2 is not incompatible p.14 b is sub-property of p2 and p14a is a sub-property of the inverse.

Habennin. It would be nice to generate in the pc extension so that CHIN would not have to support it. It would be best if the SIG created it and we reused it. That would allow us to use a distinct path.

Flutifioc. The problem, of course, is to have a whole lot of sub-properties.

Habennin. Good to ask the SIG question.

Stephen. Creating new properties and classes or not. Originally decided to try not to create new classes and properties so that the model is more easily understood by anyone who masters CIDOC CRM.

Illip. Not against creating new properties and classes but has to be relevant.

Flutifioc. Will need to document what to do anyway. Will likely not require less documentation.

Illip. If you have to create new patterns that are not covered in CIDOC. If new created it should be in those instances.

Habennin. To ensure compatibility you only have to declare class properties and use them in a logically consistent way.

Flutifioc. The fact that a category that links a set is enough to justify a class in his opinion.

Habennin. Rob Sanderson would say we need to minimize and use types because we do not want to have a bunch of models. It is a question of where to put the modeling effort and consistently going in one direction. The German national museum doing the whiskey system never use CRM classes and always make their own in order to be specific about what they are talking about. They have the problem that they are harmonized with CIDOC CRM but their ability to cross-harmonize is less so.

Flutifioc. Would be most useful for classes such as gender or occupation. In CIDOC there are three subclasses: language, material, and product. What justified the creation of these and not others?

Habennin. It was developed from the bottom-up and these were the ones that were advocated successfully for. They exist because they were there in the beginning and removing things makes things complicated although they do look arbitrary. There is no type for the format of digital objects for example. Creating a type answers that need. They are both valid solutions. The one thing not to do is to adopt both options: Metatypes. Subtype of type in which we put our type Subproperty of has type which can be queried specifically.

For Flutifioc options 2&3 come together (which would be option 4) Looking across at other people would use the model is also important because otherwise translation also has to occur.

VladimirAlexiev commented 4 years ago

Doesn't the objection about "stated vs reified triple" hold equally about P01,P02 and P14a,P14b?

It's easy to add rules to GraphDB, so the issue with the "wrong" direction of P02 can be handled easily. See http://rawgit2.com/VladimirAlexiev/my/master/pubs/extending-owl2/index.html where I argue that 2-place chains are all you need.

BTW the next GraphDB release will support RDF*, which is a better way to do reification. So you can say things like

   << :author :performed :production >>
      :inTheRoleOf :masterCraftsman.
VladimirAlexiev commented 4 years ago

In addition to the PC reification classes, CRM has a number of "shortcut vs long path" situations that can also be "automated" with property chains.

Eg from Object - Measurement - Dimension infer Object - Dimension

Flutifioc commented 4 years ago

To clarify a bit the current state of our reflexions, here is why I think property chains are interesting. The way CIDOC CRM handles the addition of metadata about statements is, as I understand, slightly camouflaged reification. Indeed, when looking at the following diagrams : Issue 30 property chain-1

both are clearly structurally identical, except that standard reification uses a direct link to the property (P14 in this example), whereas CIDOC CRM uses one class per property to encode the nature of the statement.

Therefore, the reason why reification is dangerous in general still applies in the case of CIDOC CRM. Indeed, it is possible to have an entity representing a statement that does not exist.

Issue 30 property chain-2

I don't work at NASA, but I can still create an object representing the statement "Ludovic Font works at NASA" and say stuff about that statement, even though it does not exist.

However, we can create a new subproperty of P02, that we will call P14.b (domain: PC14, range: E39), and a new subproperty of the inverse of P01, that we will call P14.a (domain: E7, range: PC14). Then, we indicate that P14 is actually a property chain of "P14.a o P14.b". This way, the simple fact of creating the reified statement makes it true, because there exists a chain of P14.a o P14.b from the activity to the person.

Issue 30 property chain-3

Of course, the drawback is that we need two new properties for each PC class.

VladimirAlexiev commented 3 years ago

RDF* support is released in GraphDB. there is a volumetric study describing the savings in storage and query time compared to reification. It uses the example of wikidata where reification is pervasive (qualifiers, references).

The savings are substantial.