apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.51k stars 26.43k forks source link

[Bug] In Triple protocol , parameter retries does not take effect #14139

Closed guipengfei closed 6 months ago

guipengfei commented 6 months ago

Pre-check

Search before asking

Apache Dubbo Component

Java SDK (apache/dubbo)

Dubbo Version

dubbo java 3.1.11,openjdk17

Steps to reproduce this issue

1.消费者端代码,重试次数设置为4,且为tri协议 ( For the consumer code, the number of retries is set to 4 and the tri protocol is used )

@DubboReference(retries = 4, url = "tri://127.0.0.1:21021")
 private IDemoService demoService;

2. 当接口调用超时时,会报以下错误,发现并未重试 ( When an interface call times out, the following error is reported and no retry is found )

org.apache.dubbo.rpc.StatusRpcException: DEADLINE_EXCEEDED : Waiting server-side response timeout by scan timer. start time: 2024-04-28 18:06:52.836, end time: 2024-04-28 18:06:54.908, timeout: 2000 ms, service: com.xx.xx.xx.IDemoService, method: queryUser
    at org.apache.dubbo.rpc.TriRpcStatus.asException(TriRpcStatus.java:214)
    at org.apache.dubbo.rpc.protocol.tri.DeadlineFuture$TimeoutCheckTask.notifyTimeout(DeadlineFuture.java:183)
at org.apache.dubbo.rpc.protocol.tri.DeadlineFuture$TimeoutCheckTask.lambda$run$0(DeadlineFuture.java:169)
    at org.apache.dubbo.common.threadpool.ThreadlessExecutor$RunnableWrapper.run(ThreadlessExecutor.java:184)
    at org.apache.dubbo.common.threadpool.ThreadlessExecutor.waitAndDrain(ThreadlessExecutor.java:103)
    at org.apache.dubbo.rpc.AsyncRpcResult.get(AsyncRpcResult.java:194)
    at org.apache.dubbo.rpc.protocol.AbstractInvoker.waitForResultIfSync(AbstractInvoker.java:266)
    at org.apache.dubbo.rpc.protocol.AbstractInvoker.invoke(AbstractInvoker.java:186)
    at org.apache.dubbo.rpc.listener.ListenerInvokerWrapper.invoke(ListenerInvokerWrapper.java:71)
    at com.cxmt.cnap.common.dubbo.core.filter.DubboTraceFilter.invoke(DubboTraceFilter.java:42)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CallbackRegistrationInvoker.invoke(FilterChainBuilder.java:194)
    at org.apache.dubbo.rpc.protocol.ReferenceCountInvokerWrapper.invoke(ReferenceCountInvokerWrapper.java:78)
    at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invokeWithContext(AbstractClusterInvoker.java:379)
    at org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:81)
    at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:341)
    at org.apache.dubbo.rpc.cluster.router.RouterSnapshotFilter.invoke(RouterSnapshotFilter.java:46)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
    at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:100)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
    at org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:52)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
    at org.apache.dubbo.rpc.cluster.filter.support.ConsumerClassLoaderFilter.invoke(ConsumerClassLoaderFilter.java:40)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
    at org.apache.dubbo.rpc.cluster.filter.support.ConsumerContextFilter.invoke(ConsumerContextFilter.java:120)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327)
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CallbackRegistrationInvoker.invoke(FilterChainBuilder.java:194)
    at org.apache.dubbo.rpc.cluster.support.wrapper.AbstractCluster$ClusterFilterInvoker.invoke(AbstractCluster.java:92)
    at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:103)
    at org.apache.dubbo.rpc.proxy.InvocationUtil.invoke(InvocationUtil.java:57)
    at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:75)
    at com.cxmt.cnap.kernel.permission.api.IUserServiceDubboProxy8.queryUser(IUserServiceDubboProxy8.java)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:208)
  1. org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker
public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        List<Invoker<T>> copyInvokers = invokers;
        checkInvokers(copyInvokers, invocation);
        String methodName = RpcUtils.getMethodName(invocation);
        // 获取重试次数成功,len = 5
        int len = calculateInvokeTimes(methodName);
        // retry loop.
        RpcException le = null; // last exception.
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyInvokers.size()); // invoked invokers.
        Set<String> providers = new HashSet<String>(len);
        for (int i = 0; i < len; i++) {
            //Reselect before retry to avoid a change of candidate `invokers`.
            //NOTE: if `invokers` changed, then `invoked` also lose accuracy.
            if (i > 0) {
                checkWhetherDestroyed();
                copyInvokers = list(invocation);
                // check again
                checkInvokers(copyInvokers, invocation);
            }
            Invoker<T> invoker = select(loadbalance, invocation, copyInvokers, invoked);
            invoked.add(invoker);
            RpcContext.getServiceContext().setInvokers((List) invoked);
            boolean success = false;
            try {
                // 1. 接口调用超时时,dubbo协议会抛出异常,被捕获后,进入下一次循环;
                //     但tri协议,返回的是正常的AsyncRpcResult对象,会在后面直接return出去,结束循环
                Result result = invokeWithContext(invoker, invocation);
                if (le != null && logger.isWarnEnabled()) {
                    logger.warn(CLUSTER_FAILED_MULTIPLE_RETRIES,"failed to retry do invoke","","Although retry the method " + methodName
                        + " in the service " + getInterface().getName()
                        + " was successful by the provider " + invoker.getUrl().getAddress()
                        + ", but there have been failed providers " + providers
                        + " (" + providers.size() + "/" + copyInvokers.size()
                        + ") from the registry " + directory.getUrl().getAddress()
                        + " on the consumer " + NetUtils.getLocalHost()
                        + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                        + le.getMessage(),le);
                }
                success = true;
               // tri协议,这里直接return了,从而导致重试次数不生效
                return result;
            } catch (RpcException e) {
                if (e.isBiz()) { // biz exception.
                    throw e;
                }
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            } finally {
                if (!success) {
                    providers.add(invoker.getUrl().getAddress());
                }
            }
        }
        throw new RpcException(le.getCode(), "Failed to invoke the method "
                + methodName + " in the service " + getInterface().getName()
                + ". Tried " + len + " times of the providers " + providers
                + " (" + providers.size() + "/" + copyInvokers.size()
                + ") from the registry " + directory.getUrl().getAddress()
                + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
                + Version.getVersion() + ". Last error is: "
                + le.getMessage(), le.getCause() != null ? le.getCause() : le);
    }
  1. 如上,接口调用超时时,FailoverClusterInvoker类的处理,在tri协议下,invokeWithContext(invoker, invocation)的返回结果是个正常的AsyncRpcResult对象,导致后面直接return从而导致重试次数没有生效? ( As above, the FailoverClusterInvoker class handles the invocation timeout. Under the tri protocol, the return result from the invokeWithContext(invoker, invocation) is a normal AsyncRpcResult object. Does the retry count fail to take effect as a result of a direct return? )

5.问题:tri协议下,重试次数不生效是bug还是特性,如果是特性,并未看到官方文档的说明 ( Problem: Whether the number of retries does not take effect under the tri protocol is a bug or a feature, if it is a feature, it is not described in the official documentation )

What you expected to happen

和dubbo协议一样,消费者端设置额重试次数生效 ( As with the dubbo protocol, the number of retries set on the consumer side takes effect )

Anything else

No response

Are you willing to submit a pull request to fix on your own?

Code of Conduct

walklown commented 6 months ago
  1. The returns of DubboInvoker and TripleInvoker are all AsyncRpcResult. Asynchronous processing will determine whether to wait synchronously at org.apache.dubbo.rpc.protocol.AbstractInvoker#waitForResultIfSync, so this is not the key to the problem.
  2. The real question: 2.1. DubboInvoker uses a blocking model. 'asyncResult.get(timeout, TimeUnit.MILLISECONDS)' in waitForResultIfSync will throw an exception when it times out, so it can run normally. 2.2 TripleInvoker will actively stop the request when the request times out (see DeadlineFuture). 'asyncResult.get(timeout, TimeUnit.MILLISECONDS)' in waitForResultIfSync will never time out because the request has actively stopped before it times out (the status is failed), so he never throws an exception and tries again.

This issue has been fixed in 3.2.x, see Revision Number 0f7a62a8ff2a2f3f66932cea4609ff06f90bb098. It is recommended to upgrade to 3.2.x to fix the problem. Hope it helps you.

image