apache / helix

Mirror of Apache Helix
Apache License 2.0
465 stars 226 forks source link

Fix flaky test TestTaskThrottling.testTaskThrottle #1304

Closed kaisun2000 closed 3 years ago

kaisun2000 commented 4 years ago

LOG: (392)

2020-08-22T07:54:50.2770215Z [ERROR] testTaskThrottle(org.apache.helix.integration.task.TestTaskThrottling) Time elapsed: 60.364 s <<< FAILURE! 2020-08-22T07:54:50.2770617Z java.lang.AssertionError: expected: but was: 2020-08-22T07:54:50.2770814Z at org.apache.helix.integration.task.TestTaskThrottling.testTaskThrottle(TestTaskThrottling.java:96) 2020-08-22T07:54:50.2770956Z

2020-08-22T07:07:53.5994944Z START TestTaskThrottling testTaskThrottle at Sat Aug 22 07:07:53 UTC 2020 2020-08-22T07:07:53.9388482Z 3067363 [ZkClient-EventThread-52826-localhost:2183] ERROR org.apache.helix.messaging.handling.HelixTaskExecutor - Failed to create message handler for 3af597da-3c37-47ca-acba-6fe8817f3f6f 2020-08-22T07:07:53.9389274Z org.apache.helix.HelixException: Another state transition for Job2_Job2:Job2_Job2_0 is in progress with msg: 30153a11-ab4e-4022-a388-bf64343dff67, p2p: false, read: 1598080073911, current:1598080073935. Discarding INIT->RUNNING message 2020-08-22T07:07:53.9389495Z at org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:927) 2020-08-22T07:07:53.9389636Z at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:484) 2020-08-22T07:07:53.9389750Z at org.apache.helix.manager.zk.CallbackHandler.enqueueTask(CallbackHandler.java:373) 2020-08-22T07:07:53.9389885Z at org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:789) 2020-08-22T07:07:53.9390014Z at org.apache.helix.zookeeper.zkclient.ZkClient$9.run(ZkClient.java:1543) 2020-08-22T07:07:53.9390142Z at org.apache.helix.zookeeper.zkclient.ZkEventThread.run(ZkEventThread.java:99) 2020-08-22T07:07:53.9507128Z 3067376 [ZkClient-EventThread-52826-localhost:2183] ERROR org.apache.helix.messaging.handling.HelixTaskExecutor - Message cannot be processed: 3af597da-3c37-47ca-acba-6fe8817f3f6f, {CREATE_TIMESTAMP=1598080073926, ClusterEventName=IdealStateChange, FROM_STATE=INIT, MSG_ID=3af597da-3c37-47ca-acba-6fe8817f3f6f, MSG_STATE=unprocessable, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=Job2_Job2_0, RESOURCE_NAME=Job2_Job2, SRC_NAME=controller_0, SRC_SESSION_ID=10000034ab20d8a, STATE_MODEL_DEF=Task, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=localhost_12918, TGT_SESSION_ID=10000034ab20d85, TO_STATE=RUNNING}{}{} 2020-08-22T07:07:53.9517834Z org.apache.helix.HelixException: Another state transition for Job2_Job2:Job2_Job2_0 is in progress with msg: 30153a11-ab4e-4022-a388-bf64343dff67, p2p: false, read: 1598080073911, current:1598080073935. Discarding INIT->RUNNING message 2020-08-22T07:07:53.9518071Z at org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:927) 2020-08-22T07:07:53.9518231Z at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:484) 2020-08-22T07:07:53.9518362Z at org.apache.helix.manager.zk.CallbackHandler.enqueueTask(CallbackHandler.java:373) 2020-08-22T07:07:53.9518496Z at org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:789) 2020-08-22T07:07:53.9518613Z at org.apache.helix.zookeeper.zkclient.ZkClient$9.run(ZkClient.java:1543) 2020-08-22T07:07:53.9518741Z at org.apache.helix.zookeeper.zkclient.ZkEventThread.run(ZkEventThread.java:99) 2020-08-22T07:07:53.9553692Z 3067378 [HelixTaskExecutor-message_handle_thread_21987] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_5 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9588490Z 3067383 [HelixTaskExecutor-message_handle_thread_22004] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_4 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9601154Z 3067383 [HelixTaskExecutor-message_handle_thread_22013] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_1 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9601661Z 3067383 [HelixTaskExecutor-message_handle_thread_21973] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_2 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9602120Z 3067383 [HelixTaskExecutor-message_handle_thread_21972] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_10 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9628178Z 3067389 [HelixTaskExecutor-message_handle_thread_22005] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_7 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9630626Z 3067389 [HelixTaskExecutor-message_handle_thread_22007] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_3 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9631582Z 3067389 [HelixTaskExecutor-message_handle_thread_22003] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_8 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:07:53.9632438Z 3067389 [HelixTaskExecutor-message_handle_thread_22006] ERROR org.apache.helix.messaging.handling.HelixStateTransitionHandler - Partition Job2_Job2_6 current state is same as toState (INIT->RUNNING) from message. 2020-08-22T07:08:53.9541199Z 3127378 [TestNGInvoker-testTaskThrottle()] ERROR org.apache.helix.TestHelper - verifier time out, consider try longer timeout, stack trace[java.lang.Thread.getStackTrace(Thread.java:1559), org.apache.helix.TestHelper.verify(TestHelper.java:804), org.apache.helix.integration.task.TestTaskThrottling.testTaskThrottle(TestTaskThrottling.java:96), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:498), org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76), org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:46), org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:37), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624), java.lang.Thread.run(Thread.java:748)] 2020-08-22T07:08:53.9636688Z END TestTaskThrottling testTaskThrottle at Sat Aug 22 07:08:53 UTC 2020, took: 60363ms.

kaisun2000 commented 4 years ago

Is the error message related? Increase wait period for now.

kaisun2000 commented 4 years ago

This seems to have another root cause.

jiajunwang commented 3 years ago

Close test unstable tickets since we have an automatic tracking mechanism https://github.com/apache/helix/pull/1757 now for tracking the most recent test issues.