dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

NPE from SRR (frontend) endpoint #6629

Closed sfayer closed 2 years ago

sfayer commented 2 years ago

Hi,

We've been asked to enable the new SRR module by WLCG, but it seems to fail to generate the JSON file on our dCache instance (frontend version 7.2.15, pools 7.2.11). Could you please have a look? (Please let me know if you need any further information).

Output:

$ curl http://localhost:3880/api/v1/srr
{"errors":[{"message":"Internal error: java.lang.NullPointerException","status":"500"}]}

Frontend log:

29 Apr 2022 17:03:32 (Frontend-gfe02) [] Uncaught exception in thread jetty-96
java.lang.NullPointerException: null
        at org.dcache.restful.srr.SrrBuilder.lambda$collectShares$3(SrrBuilder.java:213)
        at java.base/java.util.stream.ReferencePipeline$5$1.accept(ReferencePipeline.java:229)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.LongPipeline.reduce(LongPipeline.java:474)
        at java.base/java.util.stream.LongPipeline.sum(LongPipeline.java:432)
        at org.dcache.restful.srr.SrrBuilder.collectShares(SrrBuilder.java:214)
        at org.dcache.restful.srr.SrrBuilder.generate(SrrBuilder.java:122)
        at org.dcache.restful.resources.srr.SrrResource.getSrr(SrrResource.java:146)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80)
        at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
        at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232)
        at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679)
        at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:392)
        at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:365)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:318)
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
        at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:156)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
        at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:153)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:79)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:829)

Config:

[frontendDomain]
[frontendDomain/frontend]
frontend.authn.basic=true
frontend.authn.protocol=http
frontend.authz.anonymous-operations=READONLY
frontend.srr.shares=solid:/solidexperiment.org,cms:/cms

Regards, Simon

kofemann commented 2 years ago

Hi Simon,

is this situation persist or happens only after fresh restart?

sfayer commented 2 years ago

Hi,

Persistent: Testing now after everything has been running at least a few days and it still returns the error.

Regards, Simon

sfayer commented 2 years ago

Hi,

I did a bit more debugging of this and found that it does work with a few of our other pgroups: It looks like the NPE is triggered when the SRR generator is trying to process a pgroup that contains a pool that is offline. One of our pool nodes is faulty at the moment and has been offline for a while (longer than the last PoolManager restart, so I doubt it has any record of what size those pools could be, if that could be part of it?). If I temporarily remove those pools from the group, the error disappears and I get a complete JSON file back.

Regards, Simon

kofemann commented 2 years ago

Hi Simon,

Thanks a lot for the valuable information!