Explore use of RDF store on back-end

cmungall commented 8 years ago

The choice is not so important. Blazegraph has nice features, jena tbd may be easier.

The mapping from what we currently implement to a triplestore should be simple.

Currently we store each model in its own file. Here, each model would be in its own Named Graph (NG), which would have an IRI the same as the model IRI. Note with a triplestore it's easy to flush and replace a whole NG (just as we currently do for models on the filesystem).

There may be an existing owlapi<->triplestore bridge. If not this should be relatively simple, as we are dealing with a subset of owl.

We could maintain the github repo as a primarily write-only backup and additional way in which to access the models. But for most future operations, the flexibility of sparql will be better than per-file operations.

Note this would obviate the need for epione. Any server-side or client-side component would be able to fetch any kind of model metadata using a simple SPARQL-over-REST call.

cc @hdietze @kltm @DoctorBud @balhoff

DoctorBud commented 8 years ago

I'm assuming there are pieces I do not understand here about scaling and size of models, but I'm wondering about the added complexity of adding a triple-store, unless we get to move some of the API services minerva provides into the triple-store service. Is that the idea?

As a developer, I like the use of the filesystem to store models. Under the current architecture, Minerva is presumed to 'own' (not share) the models directory, and it only needs to load an 'active' model that is being edited, as well as maintain an index of the models on disk. It currently is providing enough of an API to let Noctua get and save the models, while isolating clients like Noctua from the storage format.

I think @cmungall is proposing that we isolate the model-serving capability that Minerva provides into a separate server, that this server will provide SPARQL-over-REST instead of the proprietary Minerva model serving API, and that this server will know about NGs instead of Models. Minerva will be a client of this new service, and Noctua may talk directly to the new service for index/metadata purposes, leaving Minerva to do what is left (reasoning and semantically correct model-manipulation?).

Please let me know about flawed assumptions/conclusions above.

nathandunn commented 8 years ago

I thought the owlapi automatically imported off of stored RDF triples (have done it before):

 OWLOntologyManager owlOntologyManager = OWLManager.createOWLOntologyManager()
 OWLOntology inferredOntology = owlOntologyManager.loadOntologyFromOntologyDocument(new ByteArrayInputStream(globalOntology.xmlContent.bytes))

If these are small, storing them in the file-system (with some sort of locking mechanism) or whatever you happen to have lying around is fine.

Is there a reason not to load the RDF into SciGraph? Creating an RDF loader (if it doesn’t exist) might not be worth it, but you could probably create one off of the OWLAPI.

cmungall commented 8 years ago

@DoctorBud

I like filesystems too but at some point we will grow out of it
I too dislike additional complexity, part of the goal here is to simplify. Most triplestores are reasonably good at doing multiple things. If we use a triplestore for model management, we can also use it for keeping the ontology to do ontology operations, for class autocomplete, for a variety of model-level operations (model search, and future operations such as merge); many support reasoning. If we can do this from one place, requires less moving parts and synchronization between the moving parts

@nathandunn

We in fact have an OWLAPI loader from SciGraph. In fact that is how everything gets in there!
https://github.com/SciGraph/SciGraph/wiki/Neo4jMapping
having said that, some parts are underspecified. One particular important feature is that Neo4j does not have an equivalent construct to NamedGraphs (or more or less equivalently, OWLOntologies in the OWLAPI). We have rolled our own for the dipper ttl loads, but it's a bit awkward not having this as a built in feature
many triplestores such as blazegraph support the kinds of features we will need, in particular, reasoning. Again this can be layered on to SciGraph, but writing scalable reasoners over large graphs is not something you take on lightly

All said, SG is still on the cards as a possible solution here

balhoff commented 7 years ago

I have a working implementation of this now for Blazegraph. You can test it using the blazegraph-store branch. At the moment the Blazegraph store has replaced the file store, but I am planning to make it a choice. I need to do some refactoring to clean up duplicate code between the Blazegraph store and the file store.

This first implementation is the most minimal usage of a triplestore backend: it just reads and writes entire models at a time, just replacing the file read and write steps. Things to know:

Pass the location for the Blazegraph journal file (database) in place of the path to the models directory, e.g. -f blazegraph.jnl
Noctua will not yet display the list of models saved in Blazegraph; @kltm how can we make this work?
Need to implement a way to load the existing folder of models into Blazegraph

kltm commented 7 years ago

From our conversation, as you explore blazegraph and company for the backend, there are a few items to keep in mind when thinking about how minerva current operates. Once we have a better understanding of the backend, these may be better off as separate tickets for the experimental branch.

complete metadata get: the current metadata grabs a handful of properties for the known models (title, state, modified, template, deprecated); a replacement will need to get these, as well as: title, contributor(s), state, and date(s); coordinated completion here will (finally) remove the need for epione; long-term, we'll need a more robust solution as we move from hundreds to thousands of models (possibly coordinating with other solr infrastructure)
flush to disk: easy enough apparently; while @cmungall mentioned just having it as a single big file, it would be nice to have individual models saved for easy versioning in github, model URI resolution, etc.; not sure how import/hard that would all be
removal of embedded graph JSON hack: the reason for that is a little complicated (the need to have it available for the "amigo" load using the models), and possibly not immediately related to the new backend
"merge" operations and signals: to save on wire overhead, transfer time, and update speed, certain types of operations do not want the whole graph to be passed around, but rather just the updated set; more information can be found here: https://github.com/berkeleybop/bbop-manager-minerva/wiki/MinervaRequestAPI#response-expectations

cmungall commented 7 years ago

complete metadata get: the current metadata grabs a handful of properties for the known models (title, state, modified, template, deprecated); a replacement will need to get these, as well as: title, contributor(s), state, and date(s); coordinated completion here will (finally) remove the need for epione; long-term, we'll need a more robust solution as we move from hundreds to thousands of models (possibly coordinating with other solr infrastructure)

Users will eventually want more powerful queries for model finding, which ties in with geneontology/noctua#1 -- but the fields mentioned should be good for now

flush to disk: easy enough apparently; while @cmungall mentioned just having it as a single big file, it would be nice to have individual models saved for easy versioning in github, model URI resolution, etc.; not sure how import/hard that would all be

Yes, for model versioning, model per file is useful. For resolution, I'd imagine amigo pages.

Also the issue with a single file is you either have to explicitly model how each axiom belongs to an ontology (using reification, ugh), or use a quad format like trig

removal of embedded graph JSON hack:

Rejoice!

kltm commented 7 years ago

@cmungall There are two use cases for the first item ("complete metadata get"), and while they have a lot of overlap, I think the divergence will eventually (not now, but later on) point to two implementations. I totally agree that for now we want epione gone ASAP and that all metadata should be gotten directly from the graph backend, via minerva. However, I have my doubts that this will scale long-term and/or will start making architecture and communication needlessly complicated.

So looking down the road a bit, I just wanted to clarify what I think the use cases are. (I think we may already have a discussion of this in the tracker somewhere? I couldn't find it...) The first is the required query language for search, reporting, updating, etc. This can, of course, be used to find models in very powerful ways. The second is quickly finding and getting to a model for editing. Obviously the second could be done as a subset of the first, but I think that there are practical considerations that may point to something else. It seems to me that the two natural long-term end points that we would eventually hit would be:

The graph implementation is so fast, and solr-like paging and search is easy enough to implement, that it makes sense to make a new metadata frontend/client code connections there--this would be great in that all models can be queried immediately, live
A lot of the mechanism that would be useful is essentially solr-like anyways, so we use what's there for what we need anyways; for example, if the AmiGO "model loader" was fast enough, it could fill this function, supplying metadata and well as model data for pages; things would not be as "live" (the niche that epione filled), but we'd have load and search all taken care of in an architecturally minimal way (well, depending on one's view of the architecture)

I guess that's all to say that we should keep in mind that the increased number of fields are a (much better/sane) workaround for something that will need to be revisited for the long-term solution.

balhoff commented 7 years ago

@kltm I don't completely follow:

complete metadata get: the current metadata grabs a handful of properties for the known models (title, state, modified, template, deprecated); a replacement will need to get these, as well as: title, contributor(s), state, and date(s);

There is a lot of overlap in your lists. But anyway I don't see that the current metadata service is limiting the annotations returned at all. I did reimplement it so that it uses a SPARQL query to get annotations on all the stored models rather than loading each one from a file: fcffda16a224e5f688a74d5eeb9b04809fe91eac

Can you let me know if this version is sufficient for work to proceed on updating Noctua to rely on Minerva for this? Next I'll work on importing and exporting models to/from the database.

kltm commented 7 years ago

@balhoff Sorry--I just listed everything in both sets; a replacement would be the union of the two. The fields needed for a replacement, and the removal of epione from service, would be: title, state, modified, template, deprecated, contributor(s), and date. For the last one, "date", we currently are interested in last modified, which should be an automatic annotation on the model; in the future, we'll be switching to created and last modified (which should both also get automatically annotated to the model).

cmungall commented 7 years ago

Remind me where the docs for the over-the-wire JSON is again?

kltm commented 7 years ago

@cmungall which direction?

kltm commented 7 years ago

@balhoff I remembered another immediate change that we'll have to deal with when switching to the graph backend. We are now finally at the point where we have users (MGI) consuming the model products. They are currently grabbing from the S3 drop produced by https://build.berkeleybop.org/job/export-lego-to-legacy/ . When switching to the new backend, we'll need a job in place to replace this.

This is actually kinda interesting. My druthers would be that the Noctua production machine is not the same machine that is grinding and producing the GAF (a different Jenkins profile, etc.). That would mean dumping to a repo so somebody else could produce the GAF. Any thoughts on this?

cmungall commented 7 years ago

which direction?

both, but any better than none..

https://build.berkeleybop.org/job/export-lego-to-legacy

good point, we need to make sure this continues to run.

This is actually kinda interesting. My druthers would be that the Noctua production machine is not the same machine that is grinding and producing the GAF (a different Jenkins profile, etc.). That would mean dumping to a repo so somebody else could produce the GAF. Any thoughts on this?

We can leave the job unaltered if we continue to dump model OWL from blazegraph. But that may not be the optimal way to distribute things moving forward. See also: https://github.com/geneontology/go-site/issues/172

This is the command that makes the GAFs, from the build above:

minerva/minerva-cli.sh http://purl.obolibrary.org/obo/go/extensions/go-lego.owl \
--reasoner elk \
--lego-to-gpad \
--group-by-model-organisms \
-i models/models \
--gpad-output legacy/gpad \
--gaf-output legacy/gaf

I don't believe it actually needs things split into model-per-file, a single triple dump would work as well

kltm commented 7 years ago

Not to drag this issue farther away, but documentation to barista/minerva is mostly tests and the request API overview: https://github.com/berkeleybop/minerva-requests/blob/master/tests/core.tests.js https://github.com/berkeleybop/bbop-manager-minerva/wiki From barista/minerva, pulling in actual response files: https://github.com/berkeleybop/bbop-response-barista/tree/master/tests In both cases, the programmatic API is given primacy, but the jsdoc is terrible formatted right now from its migration from naturaldocs.

kltm commented 7 years ago

For a single file dump, you don't think that versioning would be a little weird? I think we might also hit repo limits faster that way...

balhoff commented 7 years ago

FWIW the multi-file dump is implemented now.

balhoff commented 7 years ago

But it uses the Sesame Turtle writer and not OWL API. So we may need to evaluate how consistent the output order is, and configuration of prefixes.

kltm commented 7 years ago

That's great. What is the workflow for this? Has it been implemented as an internal command, maybe added to or taking the place of the current save wire command?

balhoff commented 7 years ago

Right now you can use command --dump-owl-models with minerva-cli. This needs to be done on an offline Blazegraph journal (i.e. not in use by a running minerva instance). Down the road this function could be exposed via web service command, or alternatively I should probably implement a live snapshot feature for backup purposes; a snapshot could be used for dumping OWL files.

balhoff commented 7 years ago

@kltm @cmungall I've put up instructions for Blazegraph mode here: https://github.com/geneontology/minerva/blob/blazegraph-store/INSTRUCTIONS.md#using-the-blazegraph-model-store

Let me know if anything is missing.

kltm commented 7 years ago

All good this far:

git pull
git checkout blazegraph-store
./build.sh

Then:

sjcarbon@moiraine:~/local/src/git/minerva$:) ./minerva-cli/bin/minerva-cli.sh --import-owl-models -j blazegraph.jnl -f ../obo-models/models
[bunch of probably okay stuff]
2016-10-25 12:44:32,140 WARN  (ServiceProviderHook:171) Running.
2016-10-25 12:44:32,606 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/5798651000000013
2016-10-25 12:44:33,400 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/579bd2a500000166
2016-10-25 12:44:33,667 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/579bd2a500000224
2016-10-25 12:44:33,920 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/5798651000000035
2016-10-25 12:44:34,193 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/579bd2a500000081
2016-10-25 12:44:34,449 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/5798651000000000
2016-10-25 12:44:34,697 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/5798651000000023
2016-10-25 12:44:34,936 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/579bd2a500000024
2016-10-25 12:44:35,217 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/579bd2a500000078
2016-10-25 12:44:35,407 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/579bd2a500000011
2016-10-25 12:44:35,622 INFO  (MinervaCommandRunner:151) Loading ../obo-models/models/.placeholder
2016-10-25 12:44:35,665 ERROR (Banner:160) Uncaught exception in thread
org.semanticweb.owlapi.model.OWLOntologyCreationException: Detected anonymous ontology; must have IRI
    at org.geneontology.minerva.BlazegraphMolecularModelManager.importModelToDatabase(BlazegraphMolecularModelManager.java:567)
    at org.geneontology.minerva.cli.MinervaCommandRunner.importOWLModels(MinervaCommandRunner.java:152)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at owltools.cli.CommandRunner.runSingleIteration(CommandRunner.java:3989)
    at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:76)
    at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:68)
    at org.geneontology.minerva.cli.CommandLineInterface.main(CommandLineInterface.java:8)

cmungall commented 7 years ago

What's .placeholder?

kltm commented 7 years ago

No idea, blew it away.

kltm commented 7 years ago

What is the correct catalog that we should be using?

sjcarbon@moiraine:~/local/src/git/minerva$:( reset && java -Xmx2G -jar ./minerva-server/bin/minerva-server.jar -c /home/sjcarbon/local/src/svn/geneontology.org/trunk/experimental/lego/owl/catalog-v001.xml -g http://purl.obolibrary.org/obo/go/extensions/go-lego.owl -f blazegraph.jnl --export-folder exported-models --port 9999 --use-request-logging --slme-elk --skip-class-id-validation --set-important-relation-parent http://purl.obolibrary.org/obo/LEGOREL_0000000

2016-10-25 13:48:37,577 INFO  (StartUpTool:288) Start loading ontology: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
2016-10-25 13:48:38,156 INFO  (StartUpTool:292) Adding catalog xml: /home/sjcarbon/local/src/svn/geneontology.org/trunk/experimental/lego/owl/catalog-v001.xml
2016-10-25 13:48:38,190 INFO  (ParserWrapper:69) Start loading ontology: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl from: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
2016-10-25 13:48:38,209 INFO  (ParserWrapper:76) Finished loading ontology: null from: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
Exception in thread "main" org.semanticweb.owlapi.io.UnparsableOntologyException: Problem parsing http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
Could not parse ontology.  Either a suitable parser could not be found, or parsing failed.  See parser logs below for explanation.
The following parsers were tried:

Detailed logs:

    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:234)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1061)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1017)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:935)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:891)
    at owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:157)
    at owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:144)
    at owltools.io.ParserWrapper.parse(ParserWrapper.java:126)
    at owltools.io.ParserWrapper.parseToOWLGraph(ParserWrapper.java:117)
    at org.geneontology.minerva.server.StartUpTool.startUp(StartUpTool.java:295)
    at org.geneontology.minerva.server.StartUpTool.main(StartUpTool.java:227)

kltm commented 7 years ago

I think it would be worthwhile to just check this again the GO, OBO, and Monarch models, if that hasn't been done already.

Also, could you add the proper command(s) to just run minerva in standard server mode? It looks like the current examples are for journal creation and dumps--what would the command be (the one I'm assuming we'd after we create the first journal) to just run Minerva as normal?

cmungall commented 7 years ago

We should never rely on catalogs. An ontology import chain should always resolve in the absence of a catalog. However, catalogs are useful for caching purposes.

That's an odd error message, I'd expect a list of output from each parser. I don't know if this was just an I/O error (which will hopefully be helped after we cloudfront more) or something to do with the fact that you're using what is likely a stale abandoned catalog in the experimental folder of svn.

We'll want jenkins jobs that simulate minerva startup (at least for GO and Monarch instances) to help track down things like this

balhoff commented 7 years ago

@kltm the section here was meant to be the instructions for running in standard server mode. In particular the -f option needs to point to the Blazegraph journal file and the --export-folder option should be added. The rest of the parameter values may not match the way Minerva is deployed on the actual GO servers. Is there another context where I could better describe this?

Did you get past the ontology loading problem?

kltm commented 7 years ago

Ah, I think I see what you're saying. Duh. I read "Start the Minerva Server with configuration for Blazegraph journal and model dump folder" as a temporal sequence--I thought it was going to run and dump. If that's the actual command for running the server now, I can easily map it into what we have (and extend the gulpfiles).

I have not gotten passed the loading issue yet. @cmungall ?

kltm commented 7 years ago

So the thought here with migration then is that we'll have a one-time creation of the journal and TTL files, and then the file dumps are just backups/rollbacks.

Is the rest of the pipeline (GAF gen, etc.) okay with this? As well, IIRC, @balhoff, you said that you were currently passing a superset of the metadata through. With all that, I think I may be ready to flip the switch after we get the loads taken care of.

kltm commented 7 years ago

Looking at the client API docs a little and pondering a bit, maybe to break out into other items later, if needed

store/store-all would be the obvious pair, or store with no model arg would be all models
- I assume that the old store call still functions, that should probably be offed then--we'll need to remove that from the clients
What are your thoughts on the internal sparql query API?
- due to needing to manage merge/rebuild, for clients, I'd think only queries would be safe or now
- again, maybe query/query-all or query with and without model arg

kltm commented 7 years ago

From @cmungall, we are going to give up on the obo-models side of things for now; he also said that he would take care of testing against the monarch models. Has that been done? Moving forward, to make sure we can upgrade all servers at more or less the same time, it will be worthwhile to just check this again the GO, OBO, and Monarch models.

@balhoff Assuming that https://github.com/geneontology/minerva/issues/39#issuecomment-256231504 is addressed, I think we're getting close.

I've done some more testing, but could not make your command work for actually starting the server in several variations. For example, the minimal

java -Xmx2G -jar ./minerva-server/bin/minerva-server.jar -c /home/sjcarbon/local/src/svn/geneontology.org/trunk/experimental/lego/owl/catalog-v001.xml -g http://purl.obolibrary.org/obo/go/extensions/go-lego.owl -f blazegraph.jnl --export-folder exported-models --port 3500 --use-request-logging --slme-elk --skip-class-id-validation --set-important-relation-parent http://purl.obolibrary.org/obo/LEGOREL_0000000

gives me:

2016-10-27 14:59:42,712 INFO  (StartUpTool:288) Start loading ontology: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
2016-10-27 14:59:47,783 INFO  (StartUpTool:292) Adding catalog xml: /home/sjcarbon/local/src/svn/geneontology.org/trunk/experimental/lego/owl/catalog-v001.xml
2016-10-27 14:59:48,209 INFO  (ParserWrapper:69) Start loading ontology: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl from: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
2016-10-27 14:59:48,302 INFO  (ParserWrapper:76) Finished loading ontology: null from: http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
Exception in thread "main" org.semanticweb.owlapi.io.UnparsableOntologyException: Problem parsing http://purl.obolibrary.org/obo/go/extensions/go-lego.owl
Could not parse ontology.  Either a suitable parser could not be found, or parsing failed.  See parser logs below for explanation.
The following parsers were tried:

Detailed logs:

    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:234)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1061)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1017)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:935)
    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:891)
    at owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:157)
    at owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:144)
    at owltools.io.ParserWrapper.parse(ParserWrapper.java:126)
    at owltools.io.ParserWrapper.parseToOWLGraph(ParserWrapper.java:117)
    at org.geneontology.minerva.server.StartUpTool.startUp(StartUpTool.java:295)
    at org.geneontology.minerva.server.StartUpTool.main(StartUpTool.java:227)

I've tried variations removing the catalog, etc.

However, when I model more closely on what we've been running with, I seem fine:

java -Xmx4G -cp minerva-cli/bin/minerva-cli.jar org.geneontology.minerva.server.StartUpTool --use-golr-url-logging --use-request-logging --slme-elk -g http://purl.obolibrary.org/obo/go/extensions/go-lego.owl --set-important-relation-parent http://purl.obolibrary.org/obo/LEGOREL_0000000 -f ~/local/src/git/noctua-models/models/ --port 6800 -c ~/local/src/svn/geneontology.org/trunk/ontology/extensions/catalog-v001.xml --skip-class-id-validation -f blazegraph.jnl --export-folder /tmp/exported-models

I was wondering if the command you give works for you?

balhoff commented 7 years ago

Maybe it's the different catalog in use?

kltm commented 7 years ago

I've tried variations removing the catalog, etc.

cmungall commented 7 years ago

@kltm and i looked at this today. I think it's something peculiar to minerva-server.jar, haven't had a chance to look (we normally use minerva-cli.jar, I'm not sure why we have 2).

kltm commented 7 years ago

Hm. I think that the minerva-server was a more generalized code set that we kept around for some reason? I have nothing in my notes about it, but I remember that minerva-cli was a superset of functionality?

balhoff commented 7 years ago

I thought you were running with minerva-server.jar, based on the contents of start-m3-server.sh and start-go-minerva.sh. If you have been previously running with minerva-cli.jar, then maybe you should stick with that and just update the arguments as needed. Perhaps there is a separate packaging issue to look into.

balhoff commented 7 years ago

I have not actually been running from a jar locally, instead just running the class out of Eclipse. I'm talking a look at the jar outputs now though.

balhoff commented 7 years ago

So minerva-server.jar is built with maven-assembly-plugin, but minerva-cli.jar is built with maven-shade-plugin, which is more resilient to fat jar squashing problems like we ran into here: https://github.com/ontodev/robot/issues/98. And lo:

Inside minerva-cli.jar:

$ ls META-INF
ASL2.0                       NOTICE                       info.xml                     jdom-info.xml
COPYRIGHT.txt                NOTICE.txt                   inhabitants                  mailcap
DEPENDENCIES                 README.txt                   javamail.charset.map         mailcap.default
LICENSE                      axiom.xml                    javamail.default.address.map maven
LICENSE.txt                  default.address.map          javamail.default.providers   mimetypes.default
MANIFEST.MF                  exceptions.properties        jboss-beans.xml              services

Inside minerva-server.jar:

$ ls META-INF
MANIFEST.MF

kltm commented 7 years ago

@balhoff I was wondering something playing with the API (https://github.com/geneontology/noctua/issues/371#issuecomment-257524936) and looking at #70 .

The "export-all" functionality will be the first operation that we have that operates on all models at the same time. Unlike the usual imprint left by uid (and I assume provided-by) on "action" operations, we do not want an imprint left in this case. I'm not sure if the fact that it is a meta/query will prevent that from happening; IIRC, the "store-model" op did leave an imprint as it was an action/rebuild op. I'll try and test for it once I can complete the op (see geneontology/noctua/issues/371), but I was wondering if you might have any insight beforehand?

balhoff commented 7 years ago

I don't think there should be any metadata added. The uid is not passed on to the model dump functions.

kltm commented 7 years ago

@balhoff Okay, I found a missing bit in the protocol. When doing the "export-all" operation: message and message-type must be defined for all non-error packets. In this case "success" would probably be fine.

kltm commented 7 years ago

@balhoff Note that I've since corrected some typos in the above comment.

As another request, would it be possible to add store-all in addition to or as a replacement for export-all? For consistency, the "store" functions have been server facing, while the "export" functions have been client facing.

Things seem to be working pretty well so far, which is great. The one tricky thing that we'll need to iron out at some point (probably not now) is how queries to the backend are signaled. With export-all, it's a meta/query operation that only returns a message. This is different than typical meta/query which have special data stored in the data['meta'] area that helps signal what was asked for, and then drive the display. Without extra structure, there is no way the client client really knowing why it's getting a message (unless the client happens to have a message text lookup). In the future (again, not now, but as we do more stuff with the backend), I would suggest a new category of operation like "backend", so that the message can be sorted out properly.

kltm commented 7 years ago

@balhoff I'm getting close to the end of the testing (I have finally wired-up the noctua-repl to deal with all the new stuff), but have run into a couple of (possibly related) problems.

The first is that the store model operation (not bulk, but a single model) seems to fail on modified models. If I store an unmodified model, all is good. But if I add an individual and try storing, I get an error like:

Error (error): Could not successfully handle batch request. Exception: org.geneontology.minerva.MolecularModelManager$UnknownIdentifierException. No individual found for id: 'gomodel:55ad81df00000001/5818d91900000008' and IRI: http://model.geneontology.org/55ad81df00000001/5818d91900000008 in model: http://model.geneontology.org/55ad81df00000001; your operation was likely not performed.
org.geneontology.minerva.MolecularModelManager$UnknownIdentifierException: No individual found for id: 'gomodel:55ad81df00000001/5818d91900000008' and IRI: http://model.geneontology.org/55ad81df00000001/5818d91900000008 in model: http://model.geneontology.org/55ad81df00000001
    at org.geneontology.minerva.server.handler.OperationsImpl.getIndividual(OperationsImpl.java:118)
    at org.geneontology.minerva.server.handler.OperationsImpl.handleRequestForIndividual(OperationsImpl.java:248)
    at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3Batch(JsonOrJsonpBatchHandler.java:155)
    at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3Batch(JsonOrJsonpBatchHandler.java:131)
    at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3BatchPostPrivileged(JsonOrJsonpBatchHandler.java:90)
    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:402)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:349)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:318)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1010)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:382)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:345)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:220)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
    at org.eclipse.jetty.server.Server.handle(Server.java:349)
    at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)
    at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:936)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:801)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:224)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)
    at java.lang.Thread.run(Thread.java:745)

The second issue may actually be a feature, but I'm thinking "not" right now. When I do an export-all, all unsaved modifications are apparently erased from the system.

balhoff commented 7 years ago

By "store model", do you mean "Save" from the Model menu? That is working for me. How did you add the individual?

For the second question (export-all), do you mean that unsaved modifications are lost from memory when the database is written to turtle files? I hope not. If you mean that unsaved modifications are not written out to the turtle files, I would say that is what I expect.

kltm commented 7 years ago

@balhoff

1) Correct. The individual is being added like other individuals through the API. Is seems to have nothing strange about it, but saving is blocked. (I'm using https://github.com/geneontology/minerva/commit/198e2edaab3d5571211dee4b4689d36213bfa842 ). The request looks like:

{ token: 123,
  intention: 'action',
  'provided-by': [ 'http://geneontology.org', 'http://genkisugi.net' ],
  requests: 
   [ { entity: 'individual',
       operation: 'add',
       arguments: 
        { expressions: [ { type: 'class', id: 'GO:0022008' } ],
          'model-id': 'gomodel:55ad81df00000001',
          'assign-to-variable': 'e4b5942c-28bf-43f7-b55f-f0ca1f1f01ea' } } ] }

2) I mean that apparently everything is lost from memory. If I modify a model, export-all, and reload the model, all modifications are lost.

balhoff commented 7 years ago

Re: (2), I think everything is working correctly. Think of export-all like copying the folder of owl files from the previous backend. It's just a backup of whatever has been saved. Reloading from there is more an emergency operation. Not sure if you are adding UI for that operation, but I intended it more to be called from a server-side script using curl. Just something executed every 5 or 10 minutes to make a backup.

kltm commented 7 years ago

@balhoff As far as copying all of the saved stuff to disk, that's good and expected. It's the losing everything that's _un_saved that's worrying. It feels like there's a complete reload of the internal state after the flush. From an operations perspective, it means that we would not be able to have automatic periodic flushes to disk; we'd have to go on, see what users were on or ping them, maybe do an automatic bulk save (also not good), etc.

kltm commented 7 years ago

@balhoff I've thrown everything out and rebuilt everything from scratch and I can no longer replicate the store-model and export-all errors. So great there; I'll try it again tomorrow and keep an eye on it to see if I can replicate it, but hopefully that was all something at my end.

That leaves only https://github.com/geneontology/minerva/issues/39#issuecomment-257716386 . I'm not adding any UI for the operation (right now), but it still needs to follow the protocol for the response instantiation and the REPL (where early testing and scripting happens).

balhoff commented 7 years ago

@kltm glad the errors are gone for now. I added messageType = "success" to the response. I am happy to change the name of the operation to whatever you think is best. "Store" makes me think of saving, which is a little different from what it is doing. What about "dump-all"?

kltm commented 7 years ago

Great. I'm doing some final testing now. If it goes well, I will merge it on to master, then use that to start closing the tickets on the Noctua side (@DoctorBud I'll do a merge branch off of noctua master with a summary as I collapse the issue branches back on, then merge that to master after testing), with an eye to get this all out this evening.

geneontology / minerva

Explore use of RDF store on back-end #39