ge-high-assurance / RACK

DARPA's Automated Rapid Certification of Software (ARCOS) project called Rapid Assurance Curation Kit (RACK)
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

performedBy SemTK ontology bug #321

Closed cuddihyge closed 3 years ago

cuddihyge commented 3 years ago

I am looking at the V5.0 RACK ontology and I am getting a bit confused on why things are looking like they do. one

These “performedBy” properties. The top one is what I would expect as this is defined in the ANALYSIS sadl file:

two

However the second one I am trying to figure out why that is being included:

three

It was defined the SOFTWARE sadl file an should not be tied to the ANALYSIS at all. If I look at the SOFTWARE sadl file the same “double” properties are not defined it only has the one I was expecting which was defined in the SOFTWARE sadl file:

four

I looked at the imports in the SADL for ANALYSIS and it only includes PROV-S.sadl and DOCUMENTS.sadl, and DOCUMENTS.sadl only included PROV-S.sadl. So I am not seeing how this is being added as a property for ANALYSIS. When I look at the autogenerated node groups they too appear a bit odd as they appear to have duplicated nodes for this property: ng

I am guessing that under the hood these edges are using the different properties. But I am not sure how they could have duplicate Node Names and on ingestion how I could specify which of properties I want to use.

cuddihyge commented 3 years ago

This is an email from Dan. I've triaged this enough to feel it is a real bug I should straighten out. It looks like SemTK is somewhere hashing properties by keyname (instead of full URI) and getting confused.

cuddihyge commented 3 years ago

@russell-d-e I am unwinding what appear to be multiple problems.

This issue requires some more review, because if I "fix" SPARQLgraph to accurately display the model I think there are going to be far MORE confusing performedBy instead of fewer. I will fix what appear to be errors in the display, but I think ontology changes might be considered, otherwise the display will be more confusing instead of less confusing.

I've found: 1) I think SPARQLgraph should actually display THREE performedBY in most cases because

2) Each of the three sadl files has it's own performedBy which is a subProperty of wasAssociatedWith: SOFTWARE#performedBy is a subPropertyOf PROV-S#wasAssociatedWith ANALYSIS#performedBy is a subPropertyOf PROV-S#wasAssociatedWith CONFIDENCE#performedBy is a subPropertyOf PROV-S#wasAssociatedWith

Note that in the sadl, wasAssociatedWith is never prefixed. This is probably ok since there's only one, so no disambiguation is needed. Strangely, the sw:performedBy is a type of wasAssociatedWith. statement is repeated three times. This probably doesn't matter cause any issue either

cuddihyge commented 3 years ago

@AbhaMoitra @russell-d-e Could you clarify the meaning of this Sadl to help me unwind Dan's issues with performedBy.

Sample sadl:

Tree is described by hasLeaves.

Oak is a type of Tree.
Oak is described by hasPointyLeaves.
hasPointyLeaves is a type of hasLeaves.

My question is: does this mean that Tree is described by hasPointyLeaves? Is Oak is described by hasPointyLeaves acting like a restriction or a superfluous piece of information? I looked at the owl and it still seems ambiguous.

Said another way:

russell-d-e commented 3 years ago

I think that the domain of hasPointyLeaves would be specific to Oak.

I tried the sadl the following sadl

uri "http://TREES".

Plant is a class.

hasLeaves describes Plant with values of type boolean.

Tree is a type of Plant.

Oak is a type of Tree.

hasPointedLeaves describes Oak with values of type boolean.

hasPointedLeaves is a type of hasLeaves.

which generated the following owl:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns="http://TREES#"
    xmlns:builtinfunctions="http://sadl.org/builtinfunctions#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:sadlimplicitmodel="http://sadl.org/sadlimplicitmodel#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:sadlbasemodel="http://sadl.org/sadlbasemodel#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
  xml:base="http://TREES">
  <owl:Ontology rdf:about="">
    <owl:imports rdf:resource="http://sadl.org/builtinfunctions"/>
    <owl:imports rdf:resource="http://sadl.org/sadlimplicitmodel"/>
    <owl:imports rdf:resource="http://sadl.org/sadlbasemodel"/>
    <rdfs:comment xml:lang="en">This ontology was created from a SADL file 'TREES.sadl' and should not be directly edited.</rdfs:comment>
  </owl:Ontology>
  <owl:Class rdf:ID="Tree">
    <rdfs:subClassOf>
      <owl:Class rdf:ID="Plant"/>
    </rdfs:subClassOf>
  </owl:Class>
  <owl:Class rdf:ID="Oak">
    <rdfs:subClassOf rdf:resource="#Tree"/>
  </owl:Class>
  <owl:DatatypeProperty rdf:ID="hasPointedLeaves">
    <rdfs:subPropertyOf>
      <owl:DatatypeProperty rdf:ID="hasLeaves"/>
    </rdfs:subPropertyOf>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#boolean"/>
    <rdfs:domain rdf:resource="#Oak"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:about="#hasLeaves">
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#boolean"/>
    <rdfs:domain rdf:resource="#Plant"/>
  </owl:DatatypeProperty>
</rdf:RDF>

based on this the domain of hasPointedLeaves is Oak in the OWL model. I also looked a the owl reference for subproperties:

4.1.1 rdfs:subPropertyOf
A rdfs:subPropertyOf axiom defines that the property is a subproperty of some other property. Formally this means that if P1 is a subproperty of P2, then the property extension of P1 (a set of pairs) should be a subset of the property extension of P2 (also a set of pairs).

An example:

<owl:ObjectProperty rdf:ID="hasMother">
  <rdfs:subPropertyOf rdf:resource="#hasParent"/>
</owl:ObjectProperty>
This states that all instances (pairs) contained in the property extension of the property "hasMother" are also members of the property extension of the property "hasParent".

Subproperty axioms can be applied to both datatype properties and object properties.

NOTE: In OWL DL the subject and object of a subproperty statement must be either both datatype properties or both object properties.

My reading of this is that there is nothing to suggest that the sub property would inherit the domain from the super property. Although I would say that it that the the range of the sub property has to be subtype of the range of the parent property.

cuddihyge commented 3 years ago

Abha and I came to the same conclusion. If it matters, remember that SemTK does presume that subclasses share the properties of their superclasses. That is not explicitly in the OWL either. I am just stating this, not disagreeing with how to move forward.

I will tweak SemTK to make sure that subProperties don't "inherit" the Domains of their superProperties, and also make sure the SPARQLgraph tree display is working properly. This may take a couple days.

In the mean time @AbhaMoitra @russell-d-e you may also want to review whether the ontology's three different versions of performedBy is an intentional feature or an oversight. Perhaps someone forgot the "sw:" in a couple sadl files?

cuddihyge commented 3 years ago

@russell-d-e Am I correct that in your ACTIVITY nodegroup above, neither of the performedBy edges should be there. ACTIVITY is a THING. None of the performedBy properties have a domain of THING or ACTIVITY.

I think I have a fix, but it is proving difficult to test.

russell-d-e commented 3 years ago

@cuddihyge Are you talking the ANALYSIS nodegroup? Agree that there should not be any for ACTIVITY, but ANALYSIS should have one performedBy property.

cuddihyge commented 3 years ago

Got it. sorry I typed the wrong word. I think I have a fix. I'll try to push it to -dev today.
I'd invite any further testing or simple test cases I could put in junit.