Closed AbhaMoitra closed 3 years ago
SUMMARY: This looks to me like an ontology "error" or misunderstanding of some kind. It looks like SemTK is honoring the sadl. We may need more functionality for sub-property queries.
1) Note in the picture on the nodegroup canvas (right), that we ingest the "wasDerivedFrom" (about 9 o'clock in the picture) and "satisfies" (about 3 o'clock).
2) Note on the left that "satisfies" is a sub-property of "wasImpactedBy" which is a sub-property of "wasDerivedFrom"
What is happening is:
What we need to discuss:
Just for fun: run ingest_REQUIREMENT as a construct query and you'll see the underlying data is correct, and the CONSTRUCT results display it properly.
This is a fascinating case. Any suggestions on where/how to document it? SemTK needs its own StackOverflow?
@cuddihyge : thanks for the detailed explanation and SemTK is doing what we are asking it to do. Let me think about it. @kityansiu : FYI.
@cuddihyge @kityansiu : It was Kit's idea to try the sparql query that we had in V4.1 and when that was done, those 2 columns (wasDerivedFrom, wasImpactedBy) are empty and we get a reasonable number of rows returned. So, it would seem that the sparql query re-writing that was done for performance improvement has resulted in this change in output.
I believe it is more likely that V4.1 didn't support sub-properties yet. It isn't an improvement in performance, it is an improvement in handling sub-properties :-) I think the current version is honoring the SADL.
If we need to be able to form queries in SPARQLgraph that query ONLY a property and class (not its sub-properties and sub-classes) then that could be added. This is a moderate-sized task, so I'd recommend we consider whether that's the best solution before embarking on the improvement.
@kityansiu : To check that the results in SemTK match running a sparql query in SADL, I did the following. I have a query in SADL that asks for "wasImpactedBy" and another query for "satisfies" as shown below. Note that we have instances where "satisfies" relationship has results while the first query does not. (I spelled out 'satisfies' in the query as there are 2 different 'satisfies'.) So, to me the SemTk behavior does not match SADL behavior - let me know if this does not illustrate that. Could it be that the results depend on which reasoner is employed?
Yes. In SADL you are asking for a specific relationship: wasImpactedBy.
When you draw a nodeGroup in SemTK, it presumes that all subClasses and subProperties are also matches. So it writes more complex queries.
Where your query is: select * where {?x <wasImpactedBy> ?z}
SemTK interprets the nodegroup: select * where { ?x ?prop ?z. ?prop rdfs:subPropertyOf* <wasImpactedBy> }
( I just hand-typed that so the syntax might be imperfect.)
If we've gotten to a point where it is needed, I could design an implement an override such that a SemTK class could be interpretted as "exactly this class" and a property as "exactly this property" instead of the current default behavior. Perhaps an extra check box.
This would be a reasonable and consistent. It is a moderate-sized task. I don't think we're convinced this is the best solution yet, but if we become convinced, we can put it on the board start work.
@kityansiu: are we ok with the results as is in SemTK?
As an exercise: The SADL query results above were when I used OWL_MEM as the reasoner. I then re-ran the exact same 2 queries with OWL_MEM_RDFS reasoner and I got results for both queries. So, what we saw was not because of querying for a specific relationship, but what was the reasoner that was used. So, IF WE WANTED to alter the behavior in SemTK, is it possible to simply change the reasoner employed?
We may be ok with the results in SemTK as is - just want to know if we can change reasoner engine in SemTK?
SemTK doesn't currently use a reasoner. It generates SPARQL from the nodegroup.
If we want two options (1) normal subclass/subproperty (2) exact class / exact property
then I would have to change the SPARQL generator.
I would also suspect that we wouldn't want this choice to be global, but I'm not sure. I had envisioned a checkbox on a specific edge or node signifying that it should only look for exact matches.
Here's my take on this. From a TA3 point of view, I think we are okay with the current SemTK SPARQL; I don't see a need to specify exact class or exact property at the moment. My explanation below concurs with Paul's bullet that "ingest_REQUIREMENT query doesn't need to work correctly as SELECT".
When I query, I might at first ask for the parent class or parent property, and then as I learn more about the data, I will alter the query to ask for the exact class or exact property. In our REQUIREMENT example, I might first ask for wasImpactedBy, get a handle on what all is returned, and then modify my query to ask for mitigates, satisfies, or governs. Further, with our union query, a user can ask for combinations of these subproperties.
Let's not rush into implementing a solution at the moment, since we haven't seen an example where this is causing problems or a major inconvenience.
Done
I am using the released RACK V5 version and ingest_REQUIREMENT.json from GIT on the Boeing requirements data. The 2 Boeing requirements data files can be found on the RedShare in the folder TeamWorkSpace/Abha. I loaded REQUIREMENT1.csv (283 recorts imported) and then REQUIREMENT2.csv (724 records imported) successfully. When I then query for requirements, I get 4425 results. The results have also been uploaded to RedShare in the same folder. 2 observations: