dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

7.2 (and trunk) - NPE on removal via WebDAV and token #6783

Closed DmitryLitvintsev closed 1 year ago

DmitryLitvintsev commented 2 years ago

when running this command:

curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -X DELETE  https://fndcaitb4.fnal.gov:2880${spath}/junk.data

I get this error

curl: (22) NSS: client certificate not found (nickname not specified)

The file actually gets removed and I see a NPE in the door pinboard:

16 Sep 2022 15:27:42 [jetty-59] [door:WebDAV-fndcaitb4-1@webdavDomain:AAXo0TGqBGA] DELETE :: fndcaitb4.fnal.gov:2880///mu2e/scratch/users/litvinse/junk.data start
16 Sep 2022 15:27:42 [jetty-59] [door:WebDAV-fndcaitb4-1@webdavDomain:AAXo0TGqBGA] Internal server errorjava.lang.NullPointerException: null
    at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:77)
    at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:18)
    at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
    at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:945)
    at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:905)
    at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:993)
    at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:649)
    at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:403)
    at org.springframework.kafka.core.KafkaTemplate.sendDefault(KafkaTemplate.java:382)
    at org.dcache.webdav.DcacheResourceFactory.sendRemoveInfoToBilling(DcacheResourceFactory.java:1098)
    at org.dcache.webdav.DcacheResourceFactory.deleteFile(DcacheResourceFactory.java:1083)
    at org.dcache.webdav.DcacheFileResource.delete(DcacheFileResource.java:190)
    at io.milton.http.DeleteHelperImpl.delete(DeleteHelperImpl.java:126)
    at io.milton.http.http11.DeleteHandler.processExistingResource(DeleteHandler.java:98)
    at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:196)
    at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:131)
    at io.milton.http.http11.DeleteHandler.processResource(DeleteHandler.java:83)
    at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
    at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
    at io.milton.http.http11.DeleteHandler.process(DeleteHandler.java:72)
    at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
    at io.milton.http.FilterChain.process(FilterChain.java:46)
    at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
    at io.milton.http.FilterChain.process(FilterChain.java:46)
    at io.milton.http.HttpManager.process(HttpManager.java:158)
    at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
    at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:156)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
    at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:153)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:645)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:392)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:829)

16 Sep 2022 15:27:42 [jetty-59] [door:WebDAV-fndcaitb4-1@webdavDomain:AAXo0TGqBGA] DELETE :: fndcaitb4.fnal.gov:2880///mu2e/scratch/users/litvinse/junk.data finished 36ms, Status:HTTP/1.1 500, Length:23

More details:

  1. get token:
[litvinse@fnisd1 scitokens]$ htgettoken -a fermicloud543.fnal.gov -i mu2e
Attempting to get token from https://fermicloud543.fnal.gov:8200 ... failed
Attempting kerberos auth with https://fermicloud543.fnal.gov:8200 ... succeeded
Storing vault token in /tmp/vt_u8637
Attempting to get token from https://fermicloud543.fnal.gov:8200 ... succeeded
Storing bearer token in /run/user/8637/bt_u8637

[litvinse@fnisd1 scitokens]$ spath=`httokendecode  | jq ".scope" | sed -e "s/\"//g"  | awk '{ print $NF}' | cut -d ":" -f2`
[litvinse@fnisd1 scitokens]$ echo $spath
/mu2e/scratch/users/litvinse

[litvinse@fnisd1 scitokens]$ httokendecode
{
  "wlcg.ver": "1.0",
  "aud": "https://wlcg.cern.ch/jwt/v1/any",
  "sub": "408c3486-b1ce-4d3d-ab3f-74717f80a534",
  "nbf": 1663358665,
  "scope": "storage.read:/mu2e compute.create compute.read storage.create:/mu2e/scratch/users/litvinse compute.cancel compute.modify storage.modify:/mu2e/scratch/users/litvinse",
  "iss": "https://cilogon.org/mu2e",
  "exp": 1663369470,
  "iat": 1663358670,
  "wlcg.groups": [
    "/mu2e"
  ],
  "jti": "https://cilogon.org/oauth2/3cbad440f5235fc44c38e465d3f86c6e?type=accessToken&ts=1663358669990&version=v2.0&lifetime=10800000"
}
  1. Upload a file:
[litvinse@fnisd1 scitokens]$ curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -Tjunk https://fndcaitb4.fnal.gov:2880${spath}/junk.data
curl: (22) NSS: client certificate not found (nickname not specified)
[litvinse@fnisd1 scitokens]$  TOKEN=`cat /run/user/8637/bt_u8637`
[litvinse@fnisd1 scitokens]$ curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -Tjunk https://fndcaitb4.fnal.gov:2880${spath}/junk.data
837787900 bytes uploaded
  1. check that it now exists:
[root@fndcaitb2 litvinse]# ls -altr
total 3272611
drwxrwxr-x 100    17840 9914       512 Aug 16 15:09 ..
drwxrwxr-x   6 litvinse 9914       512 Sep 16 15:26 .
-rw-rw-r--   1 litvinse 9914 837787900 Sep 16 15:26 junk.data
  1. try to remove
[litvinse@fnisd1 scitokens]$  curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -X DELETE  https://fndcaitb4.fnal.gov:2880${spath}/junk.data
curl: (22) NSS: client certificate not found (nickname not specified)
  1. observe that file does not exist:
[root@fndcaitb2 litvinse]# ls -altr
total 2454459
drwxrwxr-x 100    17840 9914       512 Aug 16 15:09 ..
drwxrwxr-x   5 litvinse 9914       512 Sep 16 15:27 .
  1. using gfal client fails as well:
[litvinse@fnisd1 scitokens]$ export BEARER_TOKEN=`cat /run/user/8637/bt_u8637`
[litvinse@fnisd1 scitokens]$  gfal-copy -K adler32 junk  https://fndcaitb4.fnal.gov:2880${spath}/junk.data
Copying file:///home/litvinse/scitokens/junk   [DONE]  after 33s 

[litvinse@fnisd1 scitokens]$  gfal-rm  https://fndcaitb4.fnal.gov:2880${spath}/junk.data
https://fndcaitb4.fnal.gov:2880/mu2e/scratch/users/litvinse/junk.data   FAILED
gfal-rm error: 5 (Input/output error) - DavPosix::unlink  HTTP 500 : Unexpected server error: 500  with url https://fndcaitb4.fnal.gov:2880/mu2e/scratch/users/litvinse/junk.data
[litvinse@fnisd1 scitokens]$ 

There is NPE on sever side in both cases. File is ultimately removed but as you can see client sees a fail.

DmitryLitvintsev commented 2 years ago

As stack trace indicates this seems to have to do with Kafka. So I did:

webdav.enable.kafka=false

and I see no NPE and delete succeeds.

@mksahakyan may be you can take a look?

DmitryLitvintsev commented 2 years ago

@paulmillar and @mksahakyan I have looked at the code in

DoorRequestMessageSerializer.java

And I modified it like so:

        JSONArray subject = new JSONArray();
        // data.getSubject().getPrincipals().forEach(s -> subject.put(s));
        data.getSubject().getPrincipals()
            .stream()
            .filter(s -> !(s instanceof JwtJtiPrincipal))
            .forEach(subject::put);

so that I am filtering out JwtJtiPrincipal.... And I get no NPE. All works as expected.

So, something in that Principal leads to NPE.

    Principal: JwtJtiPrincipal[https://cilogon.org/oauth2/3cbad440f5235fc44c38e465d3f86c6e?type=accessToken&ts=1663358669990&version=v2.0&lifetime=10800000@mu2e]
paulmillar commented 1 year ago

As I mentioned in #6722, the line numbers in the stack-trace simply don't match what dCache code is doing: not in the 7.2 branch and not in master branch. The line numbers match (more or less) how the method worked in a much older (and unsupported) version of dCache.

The only explanation that I can think of (with the evidence so far) is that you're not running the version of dCache you think you are.

There are (at least) two possible explanations:

  1. you have a rogue dcache-core.jar file installed in the /usr/share/dcache/classes, /usr/share/dcache/plugins or /usr/local/share/dcache/plugins directory, which dCache is using instead of the dcache-core.jar from the RPM.
  2. you have installed a plugin that (inadvertently) contains classes (such as DoorRequestMessageSerializer) from some ancient version of dCache.
DmitryLitvintsev commented 1 year ago

Paul,

I build RPM from master

mvn clean package -am -pl packages/fhs -P rpm

and I deployed it using

rpm -Uvh --force ...

I have been building and deploying dcache this way for many years - I do not remember anything along you suggested fault lines happening.

Please note, that the stack trace looks different from #6772

DmitryLitvintsev commented 1 year ago

I just re-installed having removed everything in classes directory:

2405  2022-09-23 08:54:26 cd /usr/share/dcache/classes/
 2406  2022-09-23 08:54:27 ls -al 
 2407  2022-09-23 08:54:35 dcache stop 
 2408  2022-09-23 08:55:27 rm -rf * 
 2409  2022-09-23 08:55:37 cd /opt/
 2413  2022-09-23 08:56:35 rpm -Uvh -force dcache-8.2.0.7d79a22-1.noarch.rpm

I get the same stacktrace on remove:

23 Sep 2022 08:59:45 (WebDAV-fndcaitb4-1) [door:WebDAV-fndcaitb4-1@webdavDomain:AAXpWJcZTYg] Internal server error
java.lang.NullPointerException: null
        at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:77)
        at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:18)
        at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:945)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:905)
        at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:993)
        at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:649)
        at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:403)
        at org.springframework.kafka.core.KafkaTemplate.sendDefault(KafkaTemplate.java:382)
        at org.dcache.webdav.DcacheResourceFactory.sendRemoveInfoToBilling(DcacheResourceFactory.java:1102)
        at org.dcache.webdav.DcacheResourceFactory.deleteFile(DcacheResourceFactory.java:1087)
        at org.dcache.webdav.DcacheFileResource.delete(DcacheFileResource.java:190)
        at io.milton.http.DeleteHelperImpl.delete(DeleteHelperImpl.java:126)
        at io.milton.http.http11.DeleteHandler.processExistingResource(DeleteHandler.java:98)
        at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:196)
        at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:131)
        at io.milton.http.http11.DeleteHandler.processResource(DeleteHandler.java:83)
        at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
        at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
        at io.milton.http.http11.DeleteHandler.process(DeleteHandler.java:72)
        at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
        at io.milton.http.FilterChain.process(FilterChain.java:46)
        at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
        at io.milton.http.FilterChain.process(FilterChain.java:46)
        at io.milton.http.HttpManager.process(HttpManager.java:158)
        at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
        at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:156)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
        at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:153)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
        at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
        at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:829)
paulmillar commented 1 year ago

Hi Dmitry,

Could you try downloading one of the 7.2 RPMs from the dcache.org site, install it, and see if you can reproduce this problem?

Cheers, Paul.

mksahakyan commented 1 year ago

the error is coming from JSONObject class

public String toString() {
    try {
        return this.toString(0);
    } catch (Exception var2) {
        return null;
    }
}

i am trying to debug is now to get the exception

mksahakyan commented 1 year ago

The patch fixing the issue is on the RBT

paulmillar commented 1 year ago

@mksahakyan Where does this JSONObject code come from?

mksahakyan commented 1 year ago

JSONObject (org.json) is comming from line 77 in DoorRequestMessageSerializer

return o.toString().getBytes(UTF_8); then at some point in JSONObject on line 185 happens the following:

static final Writer writeValue(Writer writer, Object value, int indentFactor, int indent) throws JSONException, IOException { if (value != null && !value.equals((Object)null)) {

and then is OpScopedPrincipal 's equal() is called.

It is worth noticing that, "org.json" version is old 2014

paulmillar commented 1 year ago

Sorry, I was a little unclear.

You posted a code snippet; specifically,

public String toString() {
    try {
        return this.toString(0);
    } catch (Exception var2) {
        return null;
    }
}

It isn't dCache code, so from which jar file does this code come?

mksahakyan commented 1 year ago

from the jar json-20141113.jar

 <dependency>
                <groupId>org.json</groupId>
                <artifactId>json</artifactId>
                <version>20141113</version>
   </dependency>
kofemann commented 1 year ago

https://github.com/stleary/JSON-java/blob/master/src/main/java/org/json/JSONObject.java#L2365

paulmillar commented 1 year ago

Ta:

https://github.com/stleary/JSON-java/pull/690

paulmillar commented 1 year ago

What I still don't understand is why the stack-trace contains this entry:

    at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:18)

The method serialize isn't defined at line 18, it's the class definition.

paulmillar commented 1 year ago

Here is the link to RB fix from @mksahakyan

Patch: https://rb.dcache.org/r/13690/

DmitryLitvintsev commented 1 year ago

Thank you @mksahakyan for fixing this issue