apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.32k stars 26.38k forks source link

pontential netty io blocked in pu model or triple alone #13041

Open iJIAJIA opened 11 months ago

iJIAJIA commented 11 months ago

Environment

Steps to reproduce this issue

复现步骤

  1. 采用直连方式(比较好复现,假如实例往注册中心注册一个隔离网段ip, 也是同样的效果), 配置一个不存在的内网网段. 本地的cpu核心一般较多, 可以声明多个来加大复现概率
    
    @DubboReference(check=false, url="tri://192.168.1.1:1234")
    private FooService fooService;

@DubboReference(check=false, url="tri://192.168.1.1:1233") private Foo2Service fooService;

@DubboReference(check=false, url="tri://${正常实例的访问节点}:1233", timeout="2000") private HealthService healthService;

2. 对一个正常的服务进行triple请求. 

Pls. provide [GitHub address] to reproduce this issue.

### Expected Behavior
接口正常响应. 

### Actual Behavior
业务研发同学反馈说测试环境不稳定, 有概率性出现调用超时, 且都是在使用triple协议时出现. 
查看skywalking链路追踪, 发现超时原因基本都是客户端等待响应超时, 且请求都在超时后一段时间发出.
客户端配置的超时时间为2s

### 原因
TripleProtocol使用的NettyConnectionClient, 问题代码在
```java
@Override
    protected void doConnect() throws RemotingException {
        ....
        createConnectingPromise();
        final ChannelFuture promise = bootstrap.connect();
        // 这里会添加 org.apache.dubbo.remoting.transport.netty4.NettyConnectionClient.ConnectionListener
        promise.addListener(this.connectionListener);
        // 阻塞等待指定的超时时间(默认3s)
        boolean ret = connectingPromise.get().awaitUninterruptibly(getConnectTimeout(), TimeUnit.MILLISECONDS);
        ....
    }
class ConnectionListener implements ChannelFutureListener {

        @Override
        public void operationComplete(ChannelFuture future) {
            ....
            // 失败重试时, 拿的是netty的io线程. 这里假如远端的服务没有响应, 会导致netty的io线程阻塞最多connectionTimeout的时间
            final EventLoop loop = future.channel().eventLoop();
            loop.schedule(() -> {
                try {
                    connectionClient.doConnect();
                } catch (RemotingException e) {
                    LOGGER.error(TRANSPORT_FAILED_RECONNECT, "", "",  "Failed to connect to server: " + getConnectAddress());
                }
            }, 1L, TimeUnit.SECONDS);
        }
    }

默认的dubbo协议为什么不会? dubbo使用的NettyClient. 里面的重连使用的org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask

AlbumenJ commented 11 months ago

@EarthChen @icodening @guohao PTAL

icodening commented 11 months ago

重连的超时处理应该使用异步处理

iJIAJIA commented 9 months ago

重连的超时处理应该使用异步处理

@icodening is it possible to use org.apache.dubbo.common.threadpool.manager.FrameworkExecutorRepository#sharedScheduledExecutor?