Add ability to extract shape objects from a graph by their URI

TShapinsky commented 8 months ago

TODO: If this is green lighted I will add requisite tests for this feature.

TShapinsky commented 8 months ago

@gtfierro I put together this implementation of a way to extract relevant triples from a graph by the shape's URI. Let me know what you think.

One thing which is currently missing is adding any non-shape objects to the graph.

Another possibility is including relevant owl:import where an ontology or pre-existing import exists in the graph.

gtfierro commented 8 months ago

Looks like a good start!

We should be specific about what kind of information we want to include inside this graph, and what we hope to do with the resulting graph. Is our intent just to summarize what the shape does, or are we trying to just save the triples necessary to properly conduct validation against the shape? The latter is potentially fairly complicated. I imagine this includes:

all triples in the shape's CBD
definition of all classes and shapes contained within the CBD, recursively
definition of all shapes that refer to the original shape, this includes explicitly (in the triples of the CBD), as well as inside SPARQL queries (which 223P makes heavy use of)

It doesn't look like the notebook ran properly. Would you be able to commit a run of the notebook so I can see what the output looks like?

TShapinsky commented 8 months ago

Looks like a good start!

We should be specific about what kind of information we want to include inside this graph, and what we hope to do with the resulting graph. Is our intent just to summarize what the shape does, or are we trying to just save the triples necessary to properly conduct validation against the shape? The latter is potentially fairly complicated. I imagine this includes:

all triples in the shape's CBD

definition of all classes and shapes contained within the CBD, recursively

definition of all shapes that refer to the original shape, this includes explicitly (in the triples of the CBD), as well as inside SPARQL queries (which 223P makes heavy use of)

It doesn't look like the notebook ran properly. Would you be able to commit a run of the notebook so I can see what the output looks like?

Hey Gabe,

This is a good question. On one hand I think we just want to be able to reason about a shape past its URIRef in the abstract. On the other hand there are definitely circumstances where portability should be considered. And, while portability is nice trying to accomplish that in its entirety will almost certainly hit some nasty edge cases.

Here is my possibly unifying proposal: When you extract a target shape from a target graph you are essentially creating a subgraph view with the triples that are relevant to that shape. This would mean that any shapes or classes which the target shape references that are in the target graph would be extracted, as would any shapes or classes which they reference. However, if they reference a Brick class per-se, as long as Brick is not in the target graph, the class would not accompany the target shape. In the case where an owl:import statement exists in the target graph and imports a namespace which is referenced by the target shape that import should be included.

If you wanted to get the full shape and everything needed to run it you can pull all of those into one graph before extracting the shape. In general I believe the extracted shape should be as portable and executable as the graph it came from, no more no less. This should help reduce unwanted behavior.

Additionally I'm not sure if I agree that the extracted shape should include all shapes which reference if they are not required to execute the target shape.

Extracted shapes should be:

As executable as the graph they came from (no loss of functionality)
Set addition with the original graph yields no changes (no triple generation)
Supporting nodes should be included in full (don't only include the class declaration triple)
Concise (No extraneous nodes)

Thoughts?

gtfierro commented 2 months ago

In general I believe the extracted shape should be as portable and executable as the graph it came from, no more no less. This should help reduce unwanted behavior.

I think this is a great principle. If you want to include the definitions from terms defined in other ontologies, then you need to make sure they are imported. I could see myself implementing this in https://github.com/gtfierro/ontoenv-rs

Additionally I'm not sure if I agree that the extracted shape should include all shapes which reference if they are not required to execute the target shape.

The question of "what is required" to execute the target shape will be tricky to determine. It will require following shapes and their triggers through subclass hierarchies, inferred properties, etc. If this is a design goal (which I think it should be, eventually) then we should carefully document the algorithm/traversals we use to find the components necessary for executing a given shape.

NREL / BuildingMOTIF

Add ability to extract shape objects from a graph by their URI #295