geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

Ontologies loaded into the Minerva triplestores #226

Closed lpalbou closed 4 years ago

lpalbou commented 5 years ago

Right now, the minerva triplestore seems to load only the gocam models but no ontologies.

This means that any SPARQL query requesting meta data on a GO term or a gene product will failed. Some ART queries are failing because of that (e.g. accessing the taxon of a gene or the subclasses of a GO terms)

Tagging @balhoff @kltm

goodb commented 5 years ago

That is correct, Minerva does not host ontology content apart from their tranformation into rules in Arachne. Nor should it I expect. We don't need another endpoint for the same content. Perhaps you could run a distributed query to access the ontology content from rdf.go

kltm commented 5 years ago

For testing some of the stuff that @tmushayahama is trying to do in the short term, it may be useful to at least test with the ontology added on noctua-dev and see what we get. If federation is slow and the preferred path, it may be good to revisit exactly what it is slowing it down and trying optimize for the core of what we need. @goodb To make sure that we can still get at the reactome entities, how hard would it be produce another merged ontology for Minerva that contained the additional ontologies? (At some point we should probably make it easier for Minerva to take multiple arguments.)

goodb commented 5 years ago

@kltm I'm not exactly following you here. To answer your question, ontology merge is not hard and is nicely handled by robot. But, I think modifying the import list that is the go-lego file that minerva eats for breakfast is a quite reasonable way to add whatever ontologies you like into the system (mainly meaning into Arachne). If for modifications to the go-lego file were not allowed, it would also not be hard to modify the startup command line options to accept a list of ontology files as an input.

lpalbou commented 5 years ago

@goodb in theory, I would agree: that's the promise of federated queries. And that's a very cool concept that I want/hope to see evolve over time (e.g. our discussions with wikipathways people).

In practice however, I tend to avoid federated queries as much as I can for performance considerations, especially when we are the sole owner of the data. At the moment, the SPARQL queries for ART already takes too much time and the store contains only ~2200 models whereas we are aiming for hundreds of thousand.

In addition, it's a matter of consistency: both the snapshot and "end-user" production triplestores contain both NEO and GO, there is no real the "end-curator" production triplestore to be different. It's also not great for sharing optimized queries, which in SPARQL, can take some time design.

Tagging @cmungall on this and we can discuss it during the hackathon.

goodb commented 5 years ago

Note. A use case from @tmushayahama is for the annotation review tool. Populate type aheads in annotation review tool. - I suggest this would best be handled by a dedicated, re-usable service that lives outside the Minerva context. Of which several candidates already exist.

tmushayahama commented 5 years ago

@goodb so just to clarify, the type aheads should have at least one model with it. in other words if that taxon_id is put in search it should give at least one result. So far in dev there are only 30 taxa out of many

goodb commented 4 years ago

Minerva now makes use of two internal triple stores. One holds the models and their metadata, the other holds all information about ontology terms, including genes. This other go-lego-triplestore is being used to handle much of the above.

Note that we are no longer intending to provide direct sparql access to either of these from the clients anymore. All requests should come in via API (and internally will be rerouted to sparql or whatever else is needed to make it work).