Decide on named graph approach

balhoff commented 8 years ago

Need to decide how to group triples into named graphs. The current approach is simple and easy to use: one graph for asserted triples and one graph for inferred triples. This way inferences can be included or excluded easily, even for specific parts of a query.

One problem with this is that it is difficult or impossible to access metadata associated with particular LEGO models, since this metadata is attached to the LEGO model ontology IRI. There is no link at the RDF level between a LEGO model IRI and any of the nodes within that model. In the Minerva triplestore each LEGO model is stored in a named graph with the IRI of its ontology. But this is a bit hard to use from SPARQL. The only way to query across all models (without naming hundreds of graphs) is to use the default graph, but that prevents you (I think) from excluding inferences when you want to. Blazegraph has a "virtual graph" extension but that would require some extra bookkeeping to insert all the asserted graphs into the virtual graph.

A workaround could be to add rdfs:isDefinedBy links, in Minerva, between model nodes and the model ontology IRI.

cmungall commented 8 years ago

I thought it possible to query with the value for the graph a variable?

In any case, explicit connection of nodes to graphs seems like a good idea. Would we do this at the Minerva level at the time of editing, or only on export?

We should do this for the singleton legos coming from the GAFs too, which would be a change in the gaf2legoowl command.

On 11 Oct 2016, at 7:03, Jim Balhoff wrote:

Need to decide how to group triples into named graphs. The current approach is simple and easy to use: one graph for asserted triples and one graph for inferred triples. This way inferences can be included or excluded easily, even for specific parts of a query.

One problem with this is that it is difficult or impossible to access metadata associated with particular LEGO models, since this metadata is attached to the LEGO model ontology IRI. There is no link at the RDF level between a LEGO model IRI and any of the nodes within that model. In the Minerva triplestore each LEGO model is stored in a named graph with the IRI of its ontology. But this is a bit hard to use from SPARQL. The only way to query across all models (without naming hundreds of graphs) is to use the default graph, but that prevents you (I think) from excluding inferences when you want to. Blazegraph has a "virtual graph" extension but that would require some extra bookkeeping to insert all the asserted graphs into the virtual graph.

A workaround could be to add rdfs:isDefinedBy links, in Minerva, between model nodes and the model ontology IRI.

You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geneontology/go-graphstore/issues/8

balhoff commented 8 years ago

Yes you can query for a graph in a variable. For the "hard to use" part, I mean more the situation where you don't really care about the graphs, and you just want to query across all the data, choosing whether or not to include inferred data.

balhoff commented 8 years ago

And can't access LEGO model metadata in the case that everything is dropped into one big asserted graph. I guess another option would be to load assertions into lots of named graphs and also into one big asserted graph. In the case of Blazegraph that would double the size of the database.

cmungall commented 7 years ago

https://docs.google.com/document/d/1sQnNoCmneLjZPsUc6iBgbEkuqhxihFn09u9RrpHcUc8/edit#

geneontology / go-graphstore

Decide on named graph approach #8