apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.49k stars 26.43k forks source link

建议github构建workflow增加unit-test失败上传log4j日志的配置 #8923

Open zrlw opened 3 years ago

zrlw commented 3 years ago

有些单元测试github构建失败时只在log4j日志输出了错误信息,console上没有任何提示,无法定位问题,比如HeartbeatHandlerTest的testServerHeartbeat方法,这些问题本地单元测试多次也没能重现:

  1. 3.0的github构建日志
    2021-09-25T10:17:40.0085194Z [INFO] -------------------------------------------------------
    2021-09-25T10:17:40.0086161Z [INFO]  T E S T S
    2021-09-25T10:17:40.0086846Z [INFO] -------------------------------------------------------
    2021-09-25T10:17:41.5136723Z [INFO] Running org.apache.dubbo.remoting.exchange.support.header.HeartbeatHandlerTest
    2021-09-25T10:17:42.8678103Z Server bind successfully
    2021-09-25T10:17:42.9604489Z connect count 1
    2021-09-25T10:17:44.9882458Z Server bind successfully
    2021-09-25T10:17:44.9965565Z ++++++++++++++ disconnect count 0
    2021-09-25T10:17:44.9966140Z ++++++++++++++ connect count 1
    2021-09-25T10:17:47.0152440Z Server bind successfully  <== 从此处开始等待了87分钟,但是相关情况只在log4j日志有,github看不到
    2021-09-25T11:44:45.0661784Z Terminate batch job (Y/N)?   
    2021-09-25T11:44:50.4730720Z ##[error]The action has timed out.
  2. master的github构建日志
    2021-09-25T14:00:34.4513058Z [INFO] -------------------------------------------------------
    2021-09-25T14:00:34.4513649Z [INFO]  T E S T S
    2021-09-25T14:00:34.4514082Z [INFO] -------------------------------------------------------
    2021-09-25T14:00:35.3461423Z [INFO] Running org.apache.dubbo.remoting.exchange.support.header.HeartbeatHandlerTest
    2021-09-25T14:00:36.2617537Z Server bind successfully
    2021-09-25T14:00:36.3014177Z connect count 1
    2021-09-25T14:00:38.3370599Z Server bind successfully
    2021-09-25T14:00:38.3521421Z ++++++++++++++ disconnect count 0
    2021-09-25T14:00:38.3523615Z ++++++++++++++ connect count 1
    2021-09-25T14:00:40.3670466Z Server bind successfully <== 从此处开始等待了47分钟,但是相关情况只在log4j日志有,github看不到
    2021-09-25T14:47:51.2594748Z Terminate batch job (Y/N)? 
zrlw commented 3 years ago

testServerHeartbeat方法挂起不动弹这个现象可以手工重现: 修改HeartbeatHandlerTest测试类,把testServerHeartbeat方法复制一下搞两个测试方法,代码完全相同(改不改端口号,结果都是如此),将其中一个方法名改掉,比如改成testServerHeartbeatClone,然后把这个测试类的其他测试方法注释掉,然后执行HeartbeatHandlerTest测试,这两个方法,谁在后面运行谁就会hang住,跟踪了一下发现是处理connected事件时,并没有去调用HeartbeatHandler的connected方法,channel对象的ReadTimestamp和WriteTimestamp属性一直都是null,CloseTimerTask每次执行doTask任务时lastRead和lastWrite方法都是返回null,导致无法判断空闲时间是否超时。

更新: 如果执行两遍testServerHeartbeat, 需要在@AfterEach调用FakeChannelHandlers.resetChannelHandlers()复位ChannelHandlers类的INSTANCE。testServerHeartbeat方法中间调用了FakeChannelHandlers.setTestingChannelHandlers()把ChannelHandlers类的INSTANCE给改掉之后没有复原,之后构造NettyServer对象时都会受影响。