apache / helix

Mirror of Apache Helix
Apache License 2.0
461 stars 224 forks source link

Fix TestRebalancePipeline.testDuplicateMsg #1381

Closed kaisun2000 closed 3 years ago

kaisun2000 commented 3 years ago

LOG 1059, 09/18 touch 8

2020-09-18T10:47:58.0357559Z [ERROR] testDuplicateMsg(org.apache.helix.controller.stages.TestRebalancePipeline) Time elapsed: 2.065 s <<< FAILURE! 2020-09-18T10:47:58.0359755Z java.lang.AssertionError: expected:< true > but was:< false > 2020-09-18T10:47:58.0362100Z at org.apache.helix.controller.stages.TestRebalancePipeline.testDuplicateMsg(TestRebalancePipeline.java:158) 2020-09-18T10:47:58.0364198Z 2020-09-18T10:47:58.4153495Z [ERROR] Failures: 2020-09-18T10:47:58.4157485Z [ERROR] TestRebalancePipeline.testDuplicateMsg:158 expected: but was: 2020-09-18T10:47:58.4160219Z [ERROR] Tests run: 1196, Failures: 1, Errors: 0, Skipped: 1

kaisun2000 commented 3 years ago

LOG 1263

2020-09-23T06:09:41.3248613Z [ERROR] testDuplicateMsg(org.apache.helix.controller.stages.TestRebalancePipeline) Time elapsed: 60.082 s <<< FAILURE! 2020-09-23T06:09:41.3250269Z java.lang.AssertionError: expected:< true > but was:< false > 2020-09-23T06:09:41.3252484Z at org.apache.helix.controller.stages.TestRebalancePipeline.testDuplicateMsg(TestRebalancePipeline.java:158) 2020-09-23T06:09:41.3254310Z

kaisun2000 commented 3 years ago

2020-09-23T05:59:39.1813240Z START CLUSTER_TestRebalancePipeline_dup at Wed Sep 23 05:59:39 UTC 2020 2020-09-23T06:00:39.2530066Z 4307507 [TestNGInvoker-testDuplicateMsg()] ERROR org.apache.helix.TestHelper - verifier time out, consider try longer timeout, stack trace[java.lang.Thread.getStackTrace(Thread.java:1559), org.apache.helix.TestHelper.verify(TestHelper.java:804), org.apache.helix.controller.stages.TestRebalancePipeline.testDuplicateMsg(TestRebalancePipeline.java:158), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:498), org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76), org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:46), org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:37), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624), java.lang.Thread.run(Thread.java:748)] 2020-09-23T06:00:39.2604875Z END TestRebalancePipeline testDuplicateMsg at Wed Sep 23 06:00:39 UTC 2020, took: 60083ms.

kaisun2000 commented 3 years ago

Root cause seems to be:

1/ message is removed from MessageGenerationPhase

    // Asynchronously GC pending messages if necessary
    if (!messagesToCleanUp.isEmpty()) {
      schedulePendingMessageCleanUp(messagesToCleanUp, cache.getAsyncTasksThreadPool(),
          manager.getHelixDataAccessor());
    }

2/ cache is updated

refreshStaleMessageCache()
---> call by
runPipeline(event, dataRefresh);

so what we need is to call runPipeline in TestHelper.verify() loop.

jiajunwang commented 3 years ago

Close test unstable tickets since we have an automatic tracking mechanism https://github.com/apache/helix/pull/1757 now for tracking the most recent test issues.