apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.34k stars 26.4k forks source link

org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-08-15 23:04:35.615, end time: 2021-08-15 23:05:06.234, client elapsed: 20491 ms, server elapsed: 10128 ms, timeout: 100 ms #8513

Open LiWenGu opened 3 years ago

LiWenGu commented 3 years ago

Environment

Steps to reproduce this issue

It happened online. I tried it locally and couldn't replicate it.

online log:

org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-08-15 23:04:35.615, end time: 2021-08-15 23:05:06.234, client elapsed: 20491 ms, server elapsed: 10128 ms, timeout: 100 ms, request: Request

A timeline based on online consumer and provider logs:

client server
start 1629039875615(2021-08-15 23:04:35.615)
send 1629039896106(2021-08-15 23:04:56.106)
server biz process start(2021-08-15 23:04:56.106)
server biz process end(2021-08-15 23:04:56.109)
received 1629039906234(2021-08-15 23:05:06.234)

Expected Result

Why does the consumer timeout period not take effect

relation issues: #4732 #103

dkisser commented 3 years ago

First, check network status, the log shows that it took more than 10s for client got response.Second, check application status, it also shows that application had took more than 20s for sending request

LiWenGu commented 3 years ago

当时我们有很多接口在请求,就这一个接口出现这种情况,第一次碰到这种 client start -> send 需要 20s 的情况,极其夸张,这种。通过 zabbix 查看当前机器的性能还有网络,指标也是正常的。该应用的其它接口请求是正常的。我本地测试也没法复现当时的情况。很奇怪。另外我看很久之前就有人问过这个类似的问题,但是没有解决方案

xiaoheng1 commented 3 years ago

查过线程池的状态了没?

LiWenGu commented 3 years ago

查过线程池的状态了没?

我去查看了,发现 005 机器线程数量飙升,这个日志就是 005 机器爆的异常。至于线程池状态没法定位到当时的快照信息。我再看看当时的业务日志有什么异常,这个线程数量突然变多是导致这个的原因?

image

xiaoheng1 commented 3 years ago

我去查看了,发现 005 机器线程数量飙升,这个日志就是 005 机器爆的异常。至于线程池状态没法定位到当时的快照信息。我再看看当时的业务日志有什么异常,这个线程数量突然变多是导致这个的原因?

线程数量飙升可能会引起这个问题。具体确认的话,还是要结合当时的日志,机器状态来确认。

方便贴下你们线程池这块的配置不?

LiWenGu commented 3 years ago

dubbo线程池用的 cached image

我看了下 AbortPolicyWithReport.java 类,也打印出当前的线程栈信息了

image

可能需要先分析下这个线程栈信息才能解决当时的问题

LiWenGu commented 3 years ago

这是那个时间点的 dubbo_Stack.log 文件,ip做了处理。我使用各种工具分析都说识别不出这个文件为标准的 线程 dump 文件。

Dubbo_JStack.log.2021-08-15_23:03:40.TXT

xiaoheng1 commented 3 years ago

你把 dispatcher 换成 all 试试,看还会不会出现问题

tianshuang commented 3 years ago

2.7.5 昨天也遇到该问题,持续一分钟左右自动恢复,排除了 FullGC, 暂时没找到原因

Caused by: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-09-12 17:07:02.423, end time: 2021-09-12 17:07:22.432, client elapsed: 19217 ms, server elapsed: 791 ms, timeout: 20000 ms, request: Request [id=9841521, version=2.0.2, twoway=true, event=false, broken=false, data=null], channel: /172.31.168.27:43984 -> /172.31.135.36:20891
    at org.apache.dubbo.remoting.exchange.support.DefaultFuture.doReceived(DefaultFuture.java:210) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.remoting.exchange.support.DefaultFuture.received(DefaultFuture.java:175) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.notifyTimeout(DefaultFuture.java:297) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.lambda$run$0(DefaultFuture.java:284) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.common.threadpool.ThreadlessExecutor.waitAndDrain(ThreadlessExecutor.java:77) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.AsyncRpcResult.get(AsyncRpcResult.java:175) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.AsyncToSyncInvoker.invoke(AsyncToSyncInvoker.java:61) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:89) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
    at com.alibaba.csp.sentinel.adapter.dubbo.DubboAppContextFilter.invoke(DubboAppContextFilter.java:43) ~[sentinel-apache-dubbo-adapter-1.8.0.jar:?]
    at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:49) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.filter.ConsumerContextFilter.invoke(ConsumerContextFilter.java:55) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.listener.ListenerInvokerWrapper.invoke(ListenerInvokerWrapper.java:78) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:56) ~[dubbo-2.7.5.jar:2.7.5]
    at org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:82) ~[dubbo-2.7.5.jar:2.7.5]
    ... 51 more
tianshuang commented 3 years ago

服务提供方没有 dubbo 异常日志

xiaoheng1 commented 3 years ago

配置信息方便贴下不 provider 端的

tianshuang commented 3 years ago
<dubbo:protocol threadpool="eager" threads="200" queues="1024" name="dubbo" port="20891"/>
<dubbo:provider timeout="60000" retries="0" filter="-exception,traceProviderFilter,methodFilter,exceptionFilter" />
xiaoheng1 commented 3 years ago

dispatcher 选择的是哪一个

tianshuang commented 3 years ago

默认的,没调过

xiaoheng1 commented 3 years ago

那你这种情况看不出来是啥原因导致的,我记得 2.7.5 中有一个是关于 dispatcher 的问题,会导致 server 没有发包给 client 导致超时的问题

CrazyHZM commented 2 years ago

@tianshuang Has this problem been resolved?

tianshuang commented 2 years ago

@tianshuang Has this problem been resolved? no

CrazyHZM commented 2 years ago

@tianshuang 2.7.14 Does the problem exist?

tianshuang commented 2 years ago

@tianshuang 2.7.14 Does the problem exist?

We haven’t upgraded the version, so we don’t know. 😄

CrazyHZM commented 2 years ago

@tianshuang It is recommended to upgrade to 2.7.14 and observe whether the problem can be reproduced. 2.7.5 is not recommended for production and large-scale use

tianshuang commented 2 years ago

@tianshuang It is recommended to upgrade to 2.7.14 and observe whether the problem can be reproduced. 2.7.5 is not recommended for production and large-scale use

We are mainly worried about compatibility issues, because production services need to remain stable.

helei810304 commented 2 years ago
  • [x] I have searched the issues of this repository and believe that this is not a duplicate.
  • [x] I have checked the FAQ of this repository and believe that this is not a duplicate.

Environment

  • Dubbo version: 2.7.3
  • Operating System version: Red Hat 4.8.5
  • Java version: openjdk8

Steps to reproduce this issue

It happened online. I tried it locally and couldn't replicate it.

online log:

org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-08-15 23:04:35.615, end time: 2021-08-15 23:05:06.234, client elapsed: 20491 ms, server elapsed: 10128 ms, timeout: 100 ms, request: Request

A timeline based on online consumer and provider logs:

client server start 1629039875615(2021-08-15 23:04:35.615)
send 1629039896106(2021-08-15 23:04:56.106)
server biz process start(2021-08-15 23:04:56.106) server biz process end(2021-08-15 23:04:56.109) received 1629039906234(2021-08-15 23:05:06.234)

Expected Result

Why does the consumer timeout period not take effect

relation issues: #4732 #103

问题解决了么,我是dubbo2.7.3,也会出这样的问题

totalo commented 2 years ago

+1昨天我们也出现了,服务提供方没有异常,然后版本是2.7.6

90706827 commented 1 year ago

dubbo 3.1.4 nacos 测试用例报错 找不到任何原因 哪位那位大哥帮帮忙

image
AlbumenJ commented 1 year ago

dubbo 3.1.4 nacos 测试用例报错 找不到任何原因 哪位那位大哥帮帮忙 image

单机的检查是否有 socks5 的代理

worker24h commented 1 year ago

Caused by: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2023-05-22 20:54:49.553, end time: 2023-05-22 20:54:50.572, client elapsed: 1 ms, server elapsed: 1018 ms, timeout: 1000 ms, at org.apache.dubbo.remoting.exchange.support.DefaultFuture.doReceived(DefaultFuture.java:205) at org.apache.dubbo.remoting.exchange.support.DefaultFuture.received(DefaultFuture.java:170) at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.notifyTimeout(DefaultFuture.java:288) at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.lambda$run$0(DefaultFuture.java:275) at org.apache.dubbo.common.threadpool.ThreadlessExecutor$RunnableWrapper.run(ThreadlessExecutor.java:196) at org.apache.dubbo.common.threadpool.ThreadlessExecutor.waitAndDrain(ThreadlessExecutor.java:99) at org.apache.dubbo.rpc.AsyncRpcResult.get(AsyncRpcResult.java:179) ... 49 common frames omitted

Dubbo Version 2.7.15 exist too

dongjiancom commented 3 months ago

解决了吗?2024-05-22 09:14:58;dubbo版本2.7.15也遇到了这个问题

dongjiancom commented 3 months ago

12个pod只有1个或2个出现这个问题,dubbo 线程池设置为3000,有问题的pod都满了,其他机器正常。