apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.45k stars 3.69k forks source link

Broker errors when handshake with other nodes #14242

Closed soullkk closed 1 year ago

soullkk commented 1 year ago

Broker errors when handshake with other nodes

2023-04-02 04:35:24,831 WARN [qtp1319386445-190[groupBy_[xxxx]_7a5c4d33-5a19-41d3-b39c-0d1f4458e68c]][][org.apache.druid.java.util.http.client.NettyHttpClient] Netty faulty channel failed!!! org.jboss.netty.channel.ChannelException: Failed to handshake with host[https://x.x.x.x:26204] at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory$2$1.operationComplete(ChannelResourceFactory.java:157) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:395) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1461) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.ssl.SslHandler.access$200(SslHandler.java:180) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.ssl.SslHandler$1.run(SslHandler.java:372) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] Caused by: javax.net.ssl.SSLException: Handshake did not complete within 10000ms ... 6 more 2023-04-02 04:35:24,875 WARN [ForkJoinPool-1-worker-14][][org.apache.druid.client.JsonParserIterator] Query [7a5c4d33-5a19-41d3-b39c-0d1f4458e68c] to host [x.x.x.x:26204] interrupted org.jboss.netty.channel.ChannelException: Faulty channel in resource pool at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:132) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.client.DirectDruidClient.run(DirectDruidClient.java:462) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.client.CachingClusteredClient$SpecificQueryRunnable.getSimpleServerResults(CachingClusteredClient.java:723) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.client.CachingClusteredClient$SpecificQueryRunnable.lambda$addSequencesFromServer$9(CachingClusteredClient.java:685) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at java.util.TreeMap.forEach(TreeMap.java:1005) ~[?:1.8.0_362] at org.apache.druid.client.CachingClusteredClient$SpecificQueryRunnable.addSequencesFromServer(CachingClusteredClient.java:669) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.client.CachingClusteredClient$SpecificQueryRunnable.lambda$run$2(CachingClusteredClient.java:397) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.LazySequence.toYielder(LazySequence.java:46) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.MergeSequence.lambda$toYielder$1(MergeSequence.java:65) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.MergeSequence.toYielder(MergeSequence.java:62) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.query.RetryQueryRunner$1.toYielder(RetryQueryRunner.java:134) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.common.guava.CombiningSequence.toYielder(CombiningSequence.java:78) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:78) ~[druid-processing-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.java.util.common.guava.Yielders.each(Yielders.java:32) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.server.QueryResource.doPost(QueryResource.java:230) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at sun.reflect.GeneratedMethodAccessor115.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362] at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) ~[jersey-server-1.19.3.jar:1.19.3] at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) ~[jersey-servlet-1.19.3.jar:1.19.3] at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) ~[jersey-servlet-1.19.3.jar:1.19.3] at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) ~[jersey-servlet-1.19.3.jar:1.19.3] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0] at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286) ~[guice-servlet-4.1.0.jar:?] at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276) ~[guice-servlet-4.1.0.jar:?] at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181) ~[guice-servlet-4.1.0.jar:?] at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) ~[guice-servlet-4.1.0.jar:?] at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) ~[guice-servlet-4.1.0.jar:?] at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) ~[guice-servlet-4.1.0.jar:?] at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135) ~[guice-servlet-4.1.0.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:82) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.apache.druid.server.security.AllowHttpMethodsResourceFilter.doFilter(AllowHttpMethodsResourceFilter.java:78) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:75) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.apache.druid.server.security.AllowAllAuthenticator$1.doFilter(AllowAllAuthenticator.java:84) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77) ~[druid-server-0.21.1-h0.gdd.sop.r49.jar:?] at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[jetty-servlet-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[jetty-server-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[jetty-io-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[jetty-util-9.4.48.v20220622.jar:9.4.48.v20220622] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] Caused by: org.jboss.netty.channel.ChannelException: Failed to handshake with host[https://x.x.x.x:26204] at org.apache.druid.java.util.http.client.pool.ChannelResourceFactory$2$1.operationComplete(ChannelResourceFactory.java:157) ~[druid-core-0.21.1-h0.gdd.sop.r49.jar:?] at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:409) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:395) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:362) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1461) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.ssl.SslHandler.access$200(SslHandler.java:180) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.handler.ssl.SslHandler$1.run(SslHandler.java:372) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?] ... 1 more Caused by: javax.net.ssl.SSLException: Handshake did not complete within 10000ms at org.jboss.netty.handler.ssl.SslHandler$1.run(SslHandler.java:372) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) ~[netty-3.10.6.Final.jar:?] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.6.Final.jar:?]

Affected Version

druid 0.21.1 version

Description

Please include as much detailed information about the problem as possible.

This exception occurs frequently. I found that the exception occurred in the SslHandler.handshake() when the ChannelFuture is null or not done. The source code in netty is as follows:

        try {
            engine.beginHandshake();
            runDelegatedTasks();
            handshakeFuture = this.handshakeFuture = future(channel);
            if (handshakeTimeoutInMillis > 0) {
                handshakeTimeout = timer.newTimeout(new TimerTask() {
                        public void run(Timeout timeout) throws Exception {
                        ChannelFuture future = SslHandler.this.handshakeFuture;
                        if (future != null && future.isDone()) {
                            return;
                        }
                        setHandshakeFailure(channel, new SSLException("Handshake did not complete within " +
                                        handshakeTimeoutInMillis + "ms"));
                    }
                    }, handshakeTimeoutInMillis, TimeUnit.MILLISECONDS);
            }
        } catch (Exception e) {
            handshakeFuture = this.handshakeFuture = failedFuture(channel, e);
            exception = e;
        }

I checked the server's configuration for druid. server. http. numThreads as follows: overlord=50 coordinator=50 indexer=25 historical=25 broker=12

When there is a exception connecting to other nodes(coodinator,historical,overlord,indexer), the metric of "jetty/numOpenConnections" are as follows: {"feed":"metrics","timestamp":"2023-05-05T10:00:26.959Z","service":"druid/coordinator","host":"x.x.x.x:26200","version":"","metric":"jetty/numOpenConnections","value":141} {"feed":"metrics","timestamp":"2023-05-06T10:12:44.278Z","service":"druid/historical","host":"x.x.x.x:26202","version":"","metric":"jetty/numOpenConnections","value":49} {"feed":"metrics","timestamp":"2023-05-06T12:26:32.252Z","service":"druid/overlord","host":"x.x.x.x:26203","version":"","metric":"jetty/numOpenConnections","value":158} {"feed":"metrics","timestamp":"2023-05-06T18:10:34.614Z","service":"druid/indexer","host":"x.x.x.x:26204","version":"","metric":"jetty/numOpenConnections","value":44}

soullkk commented 1 year ago

@techdocsmith @abhishekagarwal87 Do you have any ideas or suggestions to help me continue analyzing this issue?I would greatly appreciate it.

abhishekagarwal87 commented 1 year ago

How many brokers do you have? This seems like a sizing issue. I will suggest going through https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#sizing-the-connection-pool-for-queries so you can make sure that indexers/peons have the right number of threads.

soullkk commented 1 year ago

The broker has 6 nodes with the following IP addresses :xxx.8, xxx.11, xxx.9, xxx.12, xxx.5, xxx.10 historical has 7 nodes with the following IP addresses :xxx.9, xxx.7, xxx.5, xxx.11, xxx.8, xxx.10, xxx.12 The parameter 'druid. server. http. numThreads' value for the broker is 12 The parameter 'druid. server. http. numThreads' value for the historical is 25

Follow the guide of https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html#sizing-the-connection-pool-for-queries, the value of parameter 'druid. server. http. numThreads' for historical should be set to 82?

soullkk commented 1 year ago

For testing purposes, I attempted to set the value of parameter 'druid. server. http. numThreads' for historical to 1,but the issue did not recur

soullkk commented 1 year ago

It should be noted that this issue not only occurs when connecting to port 26204, but also when connecting to port 26200,26202,26203

abhishekagarwal87 commented 1 year ago

The threads on historicals and peons = (number of outgoing connections from broker) * (number of brokers) so if you have 20 connections on broker and 6 brokers, you should ideally have 120 threads on data nodes.

soullkk commented 1 year ago

Okay, I will follow your suggestions to set the values for historicals and peons, but how to set the overlord and coordinator? This issue may also occur occasionally when the broker connects to them.

abhishekagarwal87 commented 1 year ago

We have this setting druid.global.http.eagerInitialization=false that you can set in runtime properties. It will disable eager initialization of connections to master nodes. You don't need eager initialization in the path that doesn't involve querying. In querying path, we create connections eagerly so the query performance is high from the get go and doesn't suffer from connection initialization latency.

soullkk commented 1 year ago

I tried to grab the handshake timeout packets and found that the client closed the connection after the Change Cipher Spec. client port is 36631, server port is 26202. Do you have any suggestions to analyze the occurrence of this situation? I am not very familiar with TLS handshake.

image

soullkk commented 1 year ago

Thanks very much!

soullkk commented 1 year ago

Do Druid have any plans to upgrade to the Netty version?I submitted this issue to Netty, but they no longer support 3.10.6. They suggest upgrading to at least 4.1.@abhishekagarwal87