apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.26k stars 1.23k forks source link

[Flaky test]: PulsarConsumerTest, testPartitionLevelConsumerBatchMessages #13008

Open abhioncbr opened 2 months ago

abhioncbr commented 2 months ago

I noticed the issue a couple of times, here is one such run

Error:  Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 103.2 s <<< FAILURE! -- in org.apache.pinot.plugin.stream.pulsar.PulsarConsumerTest
Error:  org.apache.pinot.plugin.stream.pulsar.PulsarConsumerTest.testPartitionLevelConsumer -- Time elapsed: 9.947 s
Error:  org.apache.pinot.plugin.stream.pulsar.PulsarConsumerTest.testPartitionLevelConsumerBatchMessages -- Time elapsed: 5.495 s <<< FAILURE!
java.lang.AssertionError: expected [sample_msg_10] but found [sample_msg_0]
    at org.testng.Assert.fail(Assert.java:111)
    at org.testng.Assert.failNotEquals(Assert.java:1578)
    at org.testng.Assert.assertEqualsImpl(Assert.java:150)
    at org.testng.Assert.assertEquals(Assert.java:132)
    at org.testng.Assert.assertEquals(Assert.java:656)
    at org.testng.Assert.assertEquals(Assert.java:666)
    at org.apache.pinot.plugin.stream.pulsar.PulsarConsumerTest.verifyMessage(PulsarConsumerTest.java:231)
    at org.apache.pinot.plugin.stream.pulsar.PulsarConsumerTest.testConsumer(PulsarConsumerTest.java:220)
    at org.apache.pinot.plugin.stream.pulsar.PulsarConsumerTest.testPartitionLevelConsumerBatchMessages(PulsarConsumerTest.java:204)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:141)
    at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:686)
    at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:230)
    at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:63)
    at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:992)
    at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:203)
    at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:154)
    at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:134)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
    at org.testng.TestRunner.privateRun(TestRunner.java:739)
    at org.testng.TestRunner.run(TestRunner.java:614)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:421)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:413)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:373)
    at org.testng.SuiteRunner.run(SuiteRunner.java:312)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1274)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1208)
    at org.testng.TestNG.runSuites(TestNG.java:1112)
    at org.testng.TestNG.run(TestNG.java:1079)
    at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:155)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:102)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:91)
    at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:137)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:385)
    at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
    at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:507)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
Jackie-Jiang commented 2 months ago

Same issue as #8537. Seems like Pulsar can occasionally re-deliver the messages, especially in an environment with very limited system resources because I'm not able to reproduce this locally. @abhioncbr Do you want to help take a look?

abhioncbr commented 2 months ago

Sure, let me take a look. Thanks