Closed kltm closed 7 years ago
I finally have a replication of this event again with http://noctua.berkeleybop.org/download/gomodel:55ad81df00000001/gaf on remote.
Trying to replicate locally (http://localhost:8910/download/gomodel:55ad81df00000001/gaf), it looks more like resource starvation, so I'm not really sure how to read the same error. I would note that this seems to be killer on my dev machine, mostly pegging the cpu.
@balhoff Well, I have some information, but nothing conclusive. The only thing I have so far is that I /think/ it's not the clients and that I /think/ it's related to processing in minerva.
Here is the error:
java.lang.NullPointerException
at org.geneontology.minerva.legacy.AbstractLegoTranslator.getEntityType(AbstractLegoTranslator.java:288)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.initPayload(LegoToGeneAnnotationTranslator.java:50)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.initPayload(LegoToGeneAnnotationTranslator.java:19)
at org.geneontology.minerva.legacy.LegoModelWalker.walkModel(LegoModelWalker.java:169)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.walkModel(LegoToGeneAnnotationTranslator.java:19)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.translate(LegoToGeneAnnotationTranslator.java:34)
at org.geneontology.minerva.legacy.AbstractLegoTranslator.translate(AbstractLegoTranslator.java:311)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.translate(LegoToGeneAnnotationTranslator.java:19)
at org.geneontology.minerva.legacy.GafExportTool.exportModelLegacy(GafExportTool.java:57)
at org.geneontology.minerva.server.handler.OperationsImpl.exportLegacy(OperationsImpl.java:649)
at org.geneontology.minerva.server.handler.OperationsImpl.handleRequestForModel(OperationsImpl.java:468)
at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3Batch(JsonOrJsonpBatchHandler.java:170)
at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3Batch(JsonOrJsonpBatchHandler.java:132)
at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3BatchGetPrivileged(JsonOrJsonpBatchHandler.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:402)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:349)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:318)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1010)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:382)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:345)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:220)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)
at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:919)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:582)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)
at java.lang.Thread.run(Thread.java:745)
Here are the timings to the error:
small local machine:
large remote machine:
Local machine is a mess, as usual; larger machine seems to warm up no matter the client (get stuff off of disk?) and then hit the error. That would seem to initially rule out clients closing the connection somewhere along the path. Given the final (somewhat ambiguous) error, could this be an ID or other model issue?
The minerva.log for this is:
2016-12-14 17:56:26,355 INFO (LoggingApplicationEventListener:44) Request 905 started.
2016-12-14 17:56:26,358 INFO (LoggingApplicationEventListener:65) 905 Resource method GET started for request 905
2016-12-14 17:56:26,358 INFO (LoggingApplicationEventListener:66) 905 Headers: [{Connection,[close]},{X-Forwarded-Proto,[http]},{X-Forwarded-Server,[barista.berkeleybop.org]},{X-Forwarded-Host,[barista.berkeleybop.org]},{x-forwarded-port,[80]},{X-Forwarded-For,[131.243.192.110,::ffff:127.0.0.1]},{Host,[toaster.lbl.gov:6800]}]
2016-12-14 17:56:26,358 INFO (LoggingApplicationEventListener:67) 905 Path: /m3Batch
2016-12-14 17:56:26,359 INFO (LoggingApplicationEventListener:68) 905 PathParameters: []
2016-12-14 17:56:26,359 INFO (LoggingApplicationEventListener:69) 905 QueryParameters: [{uid,[]},{requests,[[{"entity":"model","operation":"export-legacy","arguments":{"format":"gaf","model-id":"gomodel:55ad81df00000001"}}]]},{intention,[action]}]
2016-12-14 17:56:26,359 INFO (LoggingApplicationEventListener:70) 905 Body:
2016-12-14 17:57:12,756 INFO (GolrExternalLookupService:37) Golr bioentity request: http://noctua-golr.berkeleybop.org/select?defType=edismax&qt=standard&wt=json&indent=on&fl=document_category%2Cid%2Cbioentity%2Cbioentity_label%2Cbioentity_name%2Csource%2Ctype%2Ctaxon%2Ctaxon_label%2Csynonym&facet=false&json.nl=arrarr&q=*%3A*&rows=100&start=0&fq=document_category%3A%22bioentity%22&fq=bioentity%3A%22AspGD%3AASPL0000098579%22
2016-12-14 17:57:12,768 INFO (GolrExternalLookupService:46) Golr ontology cls request: http://noctua-golr.berkeleybop.org/select?defType=edismax&qt=standard&wt=json&indent=on&fl=document_category%2Cannotation_class%2Cannotation_class_label%2Cdescription%2Csource%2Cis_obsolete%2Calternate_id%2Creplaced_by%2Csynonym%2Csubset%2Cdefinition_xref%2Cdatabase_xref%2Cisa_partof_closure%2Cregulates_closure%2Conly_in_taxon%2Conly_in_taxon_closure&facet=false&json.nl=arrarr&q=*%3A*&rows=100&start=0&fq=document_category%3A%22ontology_class%22&fq=annotation_class%3A%22AspGD%3AASPL0000098579%22
2016-12-14 17:57:12,774 INFO (LoggingApplicationEventListener:73) Request 905 finished. Processing time 46417 ms.
Previous example from other ticket seems to work fine now: http://noctua.berkeleybop.org/download/gomodel:581e072c00000473/gaf I'm wondering is this is not some kind of ID resolution issue. In both cases there were IDs that were not resolving within minerva.
In the still-failing case of "Scratch", minerva is searching for "AspGD:ASPL0000098579" in both bioentity and ontology and failing. This may be a red herring, but what happens in GAF generation if there is no way to get a label for a bioentity?
Well, in the gomodel:581e072c00000473 (KRC) case, there is no lookup. In the gomodel:55ad81df00000001 (Scratch) case, there is. So, I wonder if lookup, or failed lookup, always causes an error. Either way, it would seem that that code around the lookup should be hardened.
@kltm I can't replicate the error with gomodel:55ad81df00000001 in my local instance. I can at least add a null check in Minerva at the location of your stack trace.
@balhoff Okay, I think non-replication is probably more worrying. On production, the minerva we have is:
swdev@toaster:~/local/src/git/noctua$ java -jar ./java/lib/minerva-cli.jar --version
git-revision-sha1 1002582241d4d85de0aecd53b2f1b89c3a566b61
git-revision-url https://github.com/geneontology/minerva/commit/1002582241d4d85de0aecd53b2f1b89c3a566b61
git-branch master
git-dirty true
Locally, I have the unpushed triple-order-fix
build. Both machines are openjdk8.
For data...it looks like the current head of master right now still gives me an error. I suppose there could be something bad in the journal that a rebuild would fix? Hopefully, not. I did a rebuild on my local machine yesterday and still get the error.
So, how are you different?
Added null check in https://github.com/geneontology/minerva/commit/587577d3db6ede2308fb890f424c42f9fae82f68. This commit is added to https://github.com/geneontology/minerva/pull/83.
Could you get that fix onto triple-order-fix
as well, as that is what I was intending be built into master https://github.com/geneontology/minerva/issues/82#issuecomment-267203087 ?
Any ideas on how our environments differ? What's the method you're using to extract the GAF from your local minerva build--is it noctua, the repl, or a url?
Yep, it's pushed to triple-order-fix
. I'm using Noctua to extract the GAF. My Noctua is a few weeks old; I can try to update to the latest soon. I looked at the Minerva arguments on toaster and tried to make mine (running from Eclipse) as close as possible.
You can also get the error pretty easily with the repl--I should show you that at some point (https://github.com/geneontology/noctua-repl). The Eclipse environment can be a pretty big change. Try just running minerva from the packaged one in the repo--you may not be able to step through, but we can at least start isolating what is going wrong.
I've done a rebuild of minerva with the fix, but it reeeally long to download the ontologies at home, so hopefully I'll have something later today testingwise.
Well, after applying the fix you put onto triple-order-fix
, the error generated became harder to trace--apparently falling into more traversed but less clear channels, finally reporting as "comms issues for this model" to the client. Going all the way back to the minerva URL, it seems to just have pushed the error elsewhere:
java.lang.NullPointerException
at org.geneontology.minerva.legacy.AbstractLegoTranslator.getLabelForBioentity(AbstractLegoTranslator.java:488)
at org.geneontology.minerva.legacy.AbstractLegoTranslator.getBioentityStrings(AbstractLegoTranslator.java:479)
at org.geneontology.minerva.legacy.AbstractLegoTranslator.createBioentity(AbstractLegoTranslator.java:450)
at org.geneontology.minerva.legacy.AbstractLegoTranslator.addAnnotations(AbstractLegoTranslator.java:500)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.translate(LegoToGeneAnnotationTranslator.java:39)
at org.geneontology.minerva.legacy.AbstractLegoTranslator.translate(AbstractLegoTranslator.java:311)
at org.geneontology.minerva.legacy.LegoToGeneAnnotationTranslator.translate(LegoToGeneAnnotationTranslator.java:19)
at org.geneontology.minerva.legacy.GafExportTool.exportModelLegacy(GafExportTool.java:57)
at org.geneontology.minerva.server.handler.OperationsImpl.exportLegacy(OperationsImpl.java:649)
at org.geneontology.minerva.server.handler.OperationsImpl.handleRequestForModel(OperationsImpl.java:468)
at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3Batch(JsonOrJsonpBatchHandler.java:170)
at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3Batch(JsonOrJsonpBatchHandler.java:132)
at org.geneontology.minerva.server.handler.JsonOrJsonpBatchHandler.m3BatchGetPrivileged(JsonOrJsonpBatchHandler.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:402)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:349)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:318)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1010)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:382)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:345)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:220)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)
at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:919)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:582)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:586)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:44)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)
at java.lang.Thread.run(Thread.java:745)
I think in this case, the "easy" fix should be dropped/reverted as it is obfuscating a behavior that we agree should not occur in the first place, and is making debugging harder. And, of course, it didn't seem to work. I think we should spend the time on tracking down the root cause and figuring out why you cannot reproduce this in your local environment.
I am able to replicate the problem now (when I switched branches to work on this, my application arguments didn't quite make sense anymore and the lookup service wasn't being configured). One of the causes is here: https://github.com/geneontology/minerva/blob/master/minerva-lookup/src/main/java/org/geneontology/minerva/lookup/CachingExternalLookupService.java#L32
I don't know much about Guava, but it seems like this cache is returning null when no entries are found (the anonymous class is throwing an exception which is caught in Guava). It's not clear to me that it makes sense to throw the exception. If there are no results than an empty list should be returned. But I don't think I know everything about how the cache works. I will do some testing on the result when the empty list is returned instead of throwing the exception.
Other parts of the application should be more resilient to the null
result coming from the cache, but I guess somehow that has never been a problem before. Is it very unusual that there is not lookup result for AspGD:ASPL0000098579? Does that indicate any outside problem?
Returning an empty list (and caching it in guava I suppose) doesn't cause any failing tests, and also allows the GAF to be exported successfully (i.e. fixes this problem). Wondering if there are any unintended consequences of returning the empty list here.
With the last set of fixes, I now have output for "scratch", which seems mostly right to my eye:
!gaf-version: 2.0
FB FBgn0263395 hppy Dmel GO:0043670 PMID:9999999 NAS C gene taxon:7227 20151210 GO_Noctua
Thinking more about what's going on here though, and walking a bunch through the code, I think I'm maybe uncomfortable with the switching from the null to the empty list. Although I concur that that seems to be the way it's written, I think I may see/remember the logic of it.
My reconstruction is that when we initially created the lookup service interface, it was considered to be a good thing to throw the error: we wanted to communicate to the client immediately that they were attempting to do something where the ID was unknown and to stop the process. Originally, in the case of normal client interaction, when a client attempts to use a "bad" ID, the process gets stopped when the model is being rendered to JSON, the uncaught exception gets picked up at a much higher level, and a somewhat understandable error gets back to the client. The client needs to then understand the error and report it back to the user. All good for that task.
However, it looks like this mechanism fails when we are trying to do a bulk render; instead of plowing through, it throws the exception. In this light, it may be that "apparently could not resolve model" error may actually be the "right" thing to do, if it could be recoded to do something telling the user which ID was bum.
What I worry about now is that if the null is switched to the empty list, the original purpose of the errors may be subverted: no longer will bad IDs in a batch get reported as errors and the process stopped, but rather silently pass through the system.
I think part of the problem here is that the external lookup "hack" to get labels and possibly unknown entities has been evolved rather than completely cleanly specified. As well, we have never really decided how "bad" a bad ID is for the system, and have left a lot of parameters floating around to control and resolve this in various ways. Now that we have real users, it may be time to revisit this with @cmungall and clean up that section of the code.
TL;DR: we should talk about this more, but I think that if an error could be thrown that specifies the bum ID, that might be a more "correct" thing to do with the current setup. That said, the current setup may be sub-optimal for what we're trying to accomplish.
Here is a set of checks we can do.
Note I didn't include semantic checks, as this is not at the level of the ID but at the containing axiom.
1-3 are non-negotiable. For cases where we lack either declarations, label assertions, or logical axioms, this is undesirable, as we prefer the Minerva ontology to know the universe of objects we instantiate and use. However, we acknowledge that we may have scenarios where we don't have everything in-memory.
Okay, given the above list from @cmungall, would it be best to just have 8 completed, essentially try and still have a more informative error, then have a new ticket for the rest?
Given all this, it may be best to decouple the triple fixes (triple-order-fix
) from geneontology/minerva#82 so we can deploy that sooner as we work through this.
@cmungall Following from our conversation today, as I'm removing all access to GAF, I believe this is done with?
In some cases, trying to download/generate the GAF for a model returns a 400 response with
My initial theory https://github.com/geneontology/noctua/issues/383#issuecomment-266161244 was wrong. It looks to be an error triggered within the noctua export manager.