InTaVia / idm-rdf

Intavia Data Model for RDF data
1 stars 2 forks source link

rdfs:ranges #1

Open CarlaVS opened 2 years ago

CarlaVS commented 2 years ago

(Jouni critique) some of the rdfs:ranges of Bio CRM properties have been changed to corresponding CIDOC CRM classes instead of the Bio CRM classes in the original Bio CRM schema. This is of course a possible way and semantically correct - but what might be an issue to consider is that now e.g. updating a new version of the Bio CRM schema to InTaVia triplestore cannot be done by just replacing the current InTaVia version with the new version as the InTaVia version has some local changes - made by hand(?) (if the changes are made by an automated script, that's of course a bit better situation).

In any case, in my opinion, this is not as modularized approach as it could be, and now the content of the InTaVia integrated triplestore is not based on "global Linked Data graph merge" , making the process a bit counterintuitive.

biktorrr commented 2 years ago

Ah ok, so now I have (see also this drawing https://docs.google.com/drawings/d/1ZrwL2AbKGG_Y-_pRD8_Vorqmvtld07B56iqecNNILkE/edit):

# the person, with residence information, connected via role AND via direct triple
:person1 
  a crm:E21_Person ; 
  a idm:Person_Proxy ; 
  bioc:bearer_of :person1-actorrole-1;
  bgn:has_residence :res-sGravenhage.

# the residence (as a skos:Concept)
:res-sGravenhage
  a bgn:Residence;
  a skos:Concept;
  skos:prefLabel "'s-Gravenhage".

# the actorrole. The last triple connects this resource and the "value", but what should we use there?
:person1-actorrole-1
  a bioc:Actor_Role;
  bgn:roletype :res-sGravenhage.

# the event, now with the correct bioc triple instead of the crm one
:person1-res-event1
  a cidoc:E5_Event;
    bioc:had_participant_in_role ;
  rdfs:label "'s-Gravenhage";
 crm:P4_has_time-span :person1-res-event1-timespan1 .

Four questions:

yoge1 commented 2 years ago
biktorrr commented 2 years ago

Regarding the first point: if we model the residencey only as a class bioc:Residency (subclass of Actor_Role) and connect it to the person via a idm:has_residency (subpropertyOf bearer_of). Then is "sGravenhage" an instance of that class? or is "Person1sResidencyInsGravenhage"?

The same holds for occupation. In the example ttl file it looks like a person has a direct link to an instance of bioc:Occupation (for instance ex:Carpenter), but how would we then model one person, with two occupations (or residencies/claimtofames etc). For example: PersonX was a Carpenter from 1910-1920. And he also was a Carpenter from 1950-1960. Would that information only be available through the Events?

yoge1 commented 2 years ago

I think for the instance of the class :Residency, "Person1sResidencyInsGravenhage" would be more appropriate.

For the second question, I think this was discussed in last meeting: bioc:has_occupation shouldn't point to an instance of bioc:Occupation (e.g. bioc:Carpenter, which actually doesn't even exist in bioc namespace), but preferably to an instance of a specific subclass of bioc:Occupation. Thus, this should be corrected in the example ttl file. For example:

<https://www.intavia.org/personproxy/10055_t/12347> bioc:has_occupation idm:12347_as_painter .

idm:12347_as_painter a idm:Painter .

idm:Painter rdfs:subClassOf bioc:Occupation .

If temporal information is to be added to occupations, that should be done through the Events.

CarlaVS commented 2 years ago

My understanding is that bioc:has occupation, bioc:has gender and bioc:has nationality are supplements to avoid creating a lot of untemporalized events (because this data is often not temporalized in biographies). We created subproperties and classes in the idm namespace for those (e.g. idm:has gender; idm:Gender), so if we have a lot of untemporalized residence data, I think it's the best idea to add idm:has residence to the model, also for religion (not sure about claim to fame - because this could be temporalized in most cases, but I wouldn't mind to add it).

All temporalized data was originally planned to be modelled as events with bioc:Event_Roles. So a person can have a general occupation (e.g. Painter) and a specific occupation (e.g. professor for painting from 1910-1912). Or a general residence and a specific residence for a specific time-span.

The test data set has not been updated since the last discussion, I will correct the bioc:Occupation modeling soon.

biktorrr commented 2 years ago

@CarlaVS regarding your first part: Ironically, in biographynet data: residencies, occupations, religion, education are at least in some cases temporalized. Claim to fame is never :)

Ok, so because in Bionet, these are sometimes temporal and sometimes not, I think that for each of these attributes, we model both an Event (either with or without temporal information) and an always-non-temporal instance of a subclass of
bioc:Actor Role.

So lets again say we have the following input data:

<person>
   <state type="occupation" from="1900-1-1" to="1909-12-12">foreman on a large shipyard</state>
   <state type="residency">'s Gravenhage</state>
<person>

Is this then correct output

@base             <file:///c%3A/users/vbr240/git/intavia_biographynet/intavia_idm1.ttl> .
@prefix      owl: <http://www.w3.org/2002/07/owl#> .
@prefix      xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix     rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix      rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix      ore: <http://www.openarchives.org/ore/terms/> .
@prefix idm: <http://www.intavia.eu/idm-core/> .
@prefix      edm: <http://www.europeana.eu/schemas/edm/> .
@prefix      crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix       bgn: <http://example.org/bgn/> .
@prefix     bioc: <http://ldf.fi/schema/bioc/> .

# the person, with residence and occupation information, connected via role AND via direct triple
bgn:person1 
  a crm:E21_Person ; 
  a idm:Person_Proxy ; 
  idm:has_residence bgn:person1-res-sGravenhage-1;
  idm:has_occupation bgn:person1-occ-foreman-on-a-large-shipyard-1.

# the Residency (subclassof Actor-Role). Uses as rdfs:label the original value. 
bgn:person1-res-sGravenhage-1
  a idm:Residency;
  rdfs:label "'s Gravenhage".

# the Occupation (subclassof Actor-Role). Uses as rdfs:label the original value. 
bgn:person1-occ-foreman-on-a-large-shipyard-1
 a idm:Occupation; # we dont know its specific subclass here
 rdfs:label "foreman on a large shipyard".

# the residence event, no temporal info
bgn:person1-res-event1
  a crm:E5_Event;
  bioc:had_participant_in_role bgn:person1-res-sGravenhage-1;
  crm:P11_had_participant bgn:person1;
  rdfs:label "'s-Gravenhage".

# the occupation event, including temporal info
bgn:person1-occ-event1
  a crm:E5_Event;
  bioc:had_participant_in_role bgn:person1-occ-foreman-on-a-large-shipyard-1;
  crm:P11_had_participant bgn:person1;
  rdfs:label "'s-Gravenhage";
  crm:P4_has_time-span bgn:person1-occ-event1-timespan1 .

# the occupation timespan
 bgn:person1-occ-event1-timespan1  
  a crm:E52_Time-Span;
 crm:P82b_end_of_the_end "1900-1-1"; 
 crm:P82a_begin_of_the_begin "1909-12-12".
biktorrr commented 2 years ago

I updated the picture to match the code above: https://docs.google.com/drawings/d/1ZrwL2AbKGG_Y-_pRD8_Vorqmvtld07B56iqecNNILkE/edit

CarlaVS commented 2 years ago

The initial idea was to model residencies, occupations etc. either as an untemporalized role OR as a temporalized event (if we have dates, places etc. for these occupations). I see that it can make sense to model e.g. the temporalized occupations also as untemporalized roles to provide them all in one category. But modeling the untemporalized events where we have the roles with the same information seems like an unnecessary repetition to me. @biktorrr , where do you see the advantages in that?

There are some points in your example, that I would handle different according to the decisions that we made:

According to these points I would suggest this:

@base             <file:///c%3A/users/vbr240/git/intavia_biographynet/intavia_idm1.ttl> .
@prefix      owl: <http://www.w3.org/2002/07/owl#> .
@prefix      xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix     rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix      rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix      ore: <http://www.openarchives.org/ore/terms/> .
@prefix    idm: <http://www.intavia.eu/idm-core/> .
@prefix    idmo: <http://www.intavia.eu/idm-occuaption/> .
@prefix    idmr: <http://www.intavia.eu/idm-roles/> .
@prefix      crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix      bgn: <http://example.org/bgn/> .

# the person, with residence and occupation information, connected via role (with property idm:bearer_of) AND via direct triple
bgn:person1 
  a crm:E21_Person ; 
  a idm:Person_Proxy ; 
  idm:has_residence bgn:res-sGravenhage-1 ;
  idm:has_occupation bgn:person1-occ-foreman-on-a-large-shipyard-1 ;
  idm:bearer of bgn:person1-role-foreman-on-a-large-shipyard-1 .

# the Residency place as crm E53_Place, here also coordinates, other names etc could be added
bgn:res-sGravenhage-1
  a idm:Residency; #subclass of idm:Unary_Role and crm:E53_Place
  crm:P1_is_identified_by bgn:place-name-sGravenhage-1.

#placename for the residency
bgn:place-name-sGravenhage-1
    a crm:E41_E33_Linguistic_Appellation;
    rdfs:label "'s Gravenhage".

# the Occupation (subclass of Unary-Role). Uses as rdfs:label the original value. 
bgn:person1-occ-foreman-on-a-large-shipyard-1
 a idmo:foreman-on-a-large-shipyard; # subclass of idm:Occupation
 rdfs:label "foreman on a large shipyard".

# the occupation event, including temporal info (here also geographical information, institutions etc. could be appended).
bgn:person1-occ-event1
  a crm:E5_Event;
  idm:had_participant_in_role bgn:person1-role-foreman-on-a-large-shipyard-1 ;
  crm:P4_has_time-span bgn:person1-occ-event1-timespan1 .

# role as a class (as discussed in the last meeting)
bgn:person1-role-foreman-on-a-large-shipyard-1 
  a idmr:foreman-on-a-large-shipyard; #subclass of idm:Event_Role
  rdfs:label "foreman on a large shipyard" .

# the occupation timespan
 bgn:person1-occ-event1-timespan1  
  a crm:E52_Time-Span;
 crm:P82b_end_of_the_end "1900-1-1"^^xsd:dateTime ; 
 crm:P82a_begin_of_the_begin "1909-12-12"^^xsd:dateTime .

Does this make sense to you? And @yoge1 , does this map correctly to your idea to implement roles and professions as subclasses?

biktorrr commented 2 years ago

Hi Carla, Thanks for the elaborate answer!

Regarding the reason for the repetition: in BiographyNet, the temporality is inconsistent. So we can have personA with temporal occupations and non-temporal residencies and vice versa.

<personA>
   <state type="occupation" from="1900-1-1" to="1909-12-12">foreman on a large shipyard</state>
   <state type="residency">'s Gravenhage</state>
</personA>
<personB>
   <state type="occupation">Painter</state>
   <state type="residency"  from="1910-1-1">Haarlem</state>
</personB>

Lets take residency (For occupation the same reasoning holds.), I would argue that both for both persons, an event is implied, only that for PersonA the time is unknown and for PersonB it is (partially) known. Therefore to me it makes sense to have in both cases a residency event instance, rather than only for the one instance. And then the unary roles (has_occupation and has_residence) would just be (additional) shortcuts. Yes, this leads to lots more triples, but I think from a modelling perspective this makes sense..

Also for querying, it could be problematic if we dont have events for everyone. "give me all persons which were involved in a (residency) event, optionally with the dates of residency. We would find person B but not person A.

That all being said, if we still want to stick with the "Event only in case of temporal (or geo?) information, only unary role otherwise" approach, I can probably learn to live with that.

biktorrr commented 2 years ago

REgarding the other points (answering inline)

CarlaVS commented 2 years ago

I updated the picture to match the code above: https://docs.google.com/drawings/d/1ZrwL2AbKGG_Y-_pRD8_Vorqmvtld07B56iqecNNILkE/edit

You can find the schema for my modeling suggestion here.