Support “Multi-Class” Export of Semantic Data from Arches

apeters commented 7 years ago

GRI have identified a data modeling quirk of the CRM related to representing the production and destruction of objects. This implies the need to support multiple CRM classes for a single node in a graph, something that Arches does not support. Because this requirement appears to be unique to describing production/destruction of objects, GRI and Farallon have concluded that the most practical way to resolve the inconsistency of the CRM will be on export of resource instance data from Arches. Based on our discussions with GRI, one likely approach will be to implement a class (perhaps as a custom extension to the CRM) that resolves the conflict. Arches could then implement custom behavior on this class Deliverables:

Implement a “plug-in” or function that processes data related to a custom class (as defined and implemented by the GRI as an extension to the CRM) on export from Arches. Assumptions:
GRI will provide the required information to support import of the custom class as part of the initial import of the CRM into Arches

apeters commented 7 years ago

As an example of adding multi-class support, you could override the original rdffile.py file and add these lines at line 78

if 'E82_Actor_Appellation' in edge.domainnode.ontologyclass:
    graph.add((domainnode, RDF.type, archesproject['MY_MADE_UP_TYPE']))

What this would do, would be to add and additional type to any E82_Actor_Appellation. Of course alternate logic could be applied to get the desired result.

azaroth42 commented 7 years ago

For the Destruction/Activity scenario, there are all of the combinations:

Destruction: The volcano eruption that destroyed Pompeii.
Activity: An auction.
Destruction+Activity: Rob ripping up a piece of paper.

I think the way we need to do it, then, is to have a new internal ontology class Destruction_Activity, and then the change to rdffile.py would be to test for that, remove it, and replace with Destruction and Activity? A more generalized solution would be to create the new classes and give them some Arches specific flag in the ontology that could be tested for that then rolls up to all of the parent classes. Is it possible to add properties to node.ontologyClass?

apeters commented 7 years ago

@azaroth42 Unfortunately, it is not possible to add properties to node.ontologyClass. node.ontologyClass is simply a string (URI). To achieve what you want I think we'll have to do what you initially suggested.
Have a new internal ontology class Destruction_Activity, and then change the rdffile.py to test for that, remove it, and replace it with Destruction and Activity.

azaroth42 commented 7 years ago

Given the current set of use cases (e.g. just Destruction + Activity), this is sufficient to accomplish what we need today. In the future, when we look at classes across ontologies (e.g. an E22 MMO and a bibframe Item and an archival Document), this will turn into an combinatorial problem. But we can revisit when that happens :)

Given that the process is a bit involved (edit ontology, write a little bit of code in a particular place), documentation is important. When there's some instructions I think this is complete.

azaroth42 commented 7 years ago

A discussion with the SIG (well, with Martin) revealed another extremely common scenario where they would recommend multiple instantiation -- if you need to associate a language with an appellation, then they recommend multiple instantiation of Appellation and Linguistic Object. Given that Appellation is the name field for every type of resource, this would become a significant challenge.

For reference, the email proposing MI: http://lists.ics.forth.gr/pipermail/crm-sig/2017-September/003089.html

Given that practically every use case I can think of for names should have at least the possibility of internationalization, this (to me) makes the issue more important to solve properly and consistently. Maybe it is a case of needing to create classes and just use them, rather than trying to jump through the CRM-SIG imposed hoops.

dwuthrich commented 7 years ago

@azaroth42 Thanks for including the link to your email with Martin. As you and Martin both suggest, the most convenient approach might be to create the class(es) that we want to support “linguistic appellations”.

apeters commented 7 years ago

I agree, I think the most pertinent part of the email was this statement:

We do not declare subclasses of combinations of classes just for the sake of an 
accidental combination.  It would fill the CRM with some thousand classes without 
particular meaning. You can do that for your own convenience in a local extension.

apeters commented 7 years ago

@azaroth42 for now I'm going to document the procedure to map a custom class to multiple classes here:

To accomplish what you want to achieve here's the steps you need to take:

Add a custom ontology extension to Arches that contains the hybrid "Destruction+Activity" class by running the command
```
python manage.py load_ontology -vn {some version number} -x {path to file containing custom class}
```
Copy the rdf format file found here arches/app/utils/data_management/resources/formats/rdffile.py and place it somewhere in your project.
Edit the RESOURCE_FORMATERS setting in your settings.py file and repoint one or more of the rdf output formats (the key in the dictionary) to your custom rdffile.py from step 2
Edit your file from step 2 to find and replace references to your custom "Destruction+Activity" class with 2 seperate classes.

To do this replace the bit of code at line 75 below in your new rdffile.py file

def add_edge_to_graph(graph, domainnode, rangenode, edge):
    graph.add((domainnode, RDF.type, URIRef(edge.domainnode.ontologyclass)))
    graph.add((rangenode, RDF.type, URIRef(edge.rangenode.ontologyclass)))
    graph.add((domainnode, URIRef(edge.ontologyproperty), rangenode))

with something like this:

def add_edge_to_graph(graph, domainnode, rangenode, edge):
    if 'Destruction+Activity' in edge.domainnode.ontologyclass:
        graph.add((domainnode, RDF.type, URIRef("http://www.cidoc-crm.org/cidoc-crm/E7_Activity")))
        graph.add((domainnode, RDF.type, URIRef("http://www.cidoc-crm.org/cidoc-crm/E6_Destruction")))
    if 'Destruction+Activity' in edge.rangenode.ontologyclass:
        graph.add((rangenode, RDF.type, URIRef("http://www.cidoc-crm.org/cidoc-crm/E7_Activity")))
        graph.add((rangenode, RDF.type, URIRef("http://www.cidoc-crm.org/cidoc-crm/E6_Destruction")))

    graph.add((domainnode, URIRef(edge.ontologyproperty), rangenode))

That should be it. Now when you export resources with the format(s) that you keyed to your new rdffile.py module you should see that the node with class "Destruction+Activity" is output as 2 classes, E7_Activity and E6_Destruction.

azaroth42 commented 7 years ago

:+1: Looks sufficient to cover the use case, even if it does require changing the code.

archesproject / arches

Support “Multi-Class” Export of Semantic Data from Arches #2307