dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
292 stars 136 forks source link

Webdav: TimeoutException while writing content #5461

Open onnozweers opened 4 years ago

onnozweers commented 4 years ago

Dear dCache devs,

A user has some issues writing (lots of) files. From the webdav door log:

[root@guppy1 /var/log]# cat webdav2880-guppy1Domain.log
25 Jun 2020 14:40:15 (webdav2880-guppy1) [door:webdav2880-guppy1@webdav2880-guppy1Domain:AAWo571/fVA] Internal server error
java.lang.RuntimeException: java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
    at io.milton.http.webdav.DefaultWebDavResponseHandler.respondPropFind(DefaultWebDavResponseHandler.java:212)
    at io.milton.http.AbstractWrappingResponseHandler.respondPropFind(AbstractWrappingResponseHandler.java:200)
    at org.dcache.webdav.DcacheResponseHandler.respondPropFind(DcacheResponseHandler.java:296)
    at io.milton.http.AbstractWrappingResponseHandler.respondPropFind(AbstractWrappingResponseHandler.java:200)
    at io.milton.http.webdav.PropFindHandler.processExistingResource(PropFindHandler.java:156)
    at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:196)
    at io.milton.http.webdav.PropFindHandler.processResource(PropFindHandler.java:112)
    at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
    at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:70)
    at io.milton.http.webdav.PropFindHandler.process(PropFindHandler.java:106)
    at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:52)
    at io.milton.http.FilterChain.process(FilterChain.java:46)
    at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:297)
    at io.milton.http.FilterChain.process(FilterChain.java:46)
    at io.milton.http.HttpManager.process(HttpManager.java:158)
    at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:80)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.dcache.http.AuthenticationHandler.access$101(AuthenticationHandler.java:63)
    at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:162)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
    at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:160)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:65)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:505)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
    at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:235)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:226)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:545)
    at java.base/java.io.OutputStream.write(OutputStream.java:122)
    at io.milton.http.webdav.DefaultWebDavResponseHandler.respondPropFind(DefaultWebDavResponseHandler.java:210)
    ... 47 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
    at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
    at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    ... 1 common frames omitted
25 Jun 2020 14:40:15 (webdav2880-guppy1) [door:webdav2880-guppy1@webdav2880-guppy1Domain:AAWo571/fVA] exception writing content

The webdav door config:

[webdav2880-${host.name}Domain]
dcache.java.memory.heap=20G
[webdav2880-${host.name}Domain/webdav]
webdav.cell.name=webdav2880-${host.name}
webdav.redirect.on-read=false
webdav.redirect.on-write=false
webdav.authn.basic=true
webdav.authn.protocol=https
webdav.net.port=2880
webdav.enable.overwrite=true
webdav.service.pnfsmanager.timeout = 120
webdav.service.pnfsmanager.timeout.unit=MINUTES

dCache 6.0.16.

Any ideas what could cause this?

Kind regards, Onno

paulmillar commented 4 years ago

Hi Onno,

OK, I think what is happening here is dCache is taking too long to reply. By default, there's a 300 second (5 minutes) idle timeout in webdav (see webdav.limits.idle-time ). It seems that dCache is taking longer than 5 minutes to reply to this PROPFIND request. So, by the time the webdav door does finally reply, the connection is closed.

If so, why does it take dCache so long to reply to a PROPFIND request?

PROPFIND could be a directory listing. You also mentioned (in #5460) that you're seeing directory lists requests being queued in PnfsManager. That might explain it.

It would also be interesting to see if these errors correlate with the errors mentioned in #5460.

Cheers, Paul.

paulmillar commented 4 years ago

One further question: you have the pnfsmanager timeout set to two hours. This is quite a large number: the default value is two minutes.

Was this the result of PnfsManager being slow to respond?

If you have the PnfsManager timeout set to a value larger than the idle connection timeout then you should also adjust the idle connection timeout.

In your case, you could simply set the idle connection timeout to the same value as the PnfsManager timeout (or perhaps a tidy bit longer); however, I would try to reign in the PnfsManager timeout value. Two hours seems far too long for a WebDAV door to have to wait for a reply from PnfsManager.