apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.25k stars 3.58k forks source link

[Tests] LockManagerImpl.close hangs and causes the test run to timeout eventually #13092

Closed lhotari closed 1 year ago

lhotari commented 2 years ago

Describe the bug

LockManagerImpl.close sporadically hangs and causes the test run to timeout eventually.

Here's an example thread stack trace:

"main" #1 prio=5 os_prio=0 cpu=48792.59ms elapsed=3273.19s tid=0x00007f2ac0028000 nid=0xc4e waiting on condition  [0x00007f2ac7ca7000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
    - parking to wait for  <0x00000000ca4007f8> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.13/CompletableFuture.java:1796)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.13/ForkJoinPool.java:3128)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.13/CompletableFuture.java:1823)
    at java.util.concurrent.CompletableFuture.join(java.base@11.0.13/CompletableFuture.java:2043)
    at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.close(LockManagerImpl.java:163)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.stop(ModularLoadManagerImpl.java:902)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerWrapper.stop(ModularLoadManagerWrapper.java:107)
    at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:421)
    at org.apache.pulsar.broker.PulsarService$MockitoMock$1723093763.closeAsync$accessor$h6Cy2gqF(Unknown Source)
    at org.apache.pulsar.broker.PulsarService$MockitoMock$1723093763$auxiliary$jyJcdii6.call(Unknown Source)
    at org.mockito.internal.invocation.RealMethod$FromCallable$1.call(RealMethod.java:40)
    at org.mockito.internal.invocation.RealMethod$FromBehavior.invoke(RealMethod.java:62)
    at org.mockito.internal.invocation.InterceptedInvocation.callRealMethod(InterceptedInvocation.java:141)
    at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:44)
    at org.mockito.Answers.answer(Answers.java:98)
    at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:106)
    at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
    at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:33)
    at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)
    at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:56)
    at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:141)
    at org.apache.pulsar.broker.PulsarService$MockitoMock$1723093763.closeAsync(Unknown Source)
    at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:352)
    at org.apache.pulsar.broker.PulsarService$MockitoMock$1723093763.close$accessor$h6Cy2gqF(Unknown Source)
    at org.apache.pulsar.broker.PulsarService$MockitoMock$1723093763$auxiliary$fMYlBnS4.call(Unknown Source)
    at org.mockito.internal.invocation.RealMethod$FromCallable$1.call(RealMethod.java:40)
    at org.mockito.internal.invocation.RealMethod$FromBehavior.invoke(RealMethod.java:62)
    at org.mockito.internal.invocation.InterceptedInvocation.callRealMethod(InterceptedInvocation.java:141)
    at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:44)
    at org.mockito.Answers.answer(Answers.java:98)
    at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:106)
    at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
    at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:33)
    at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)
    at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:56)
    at org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:141)
    at org.apache.pulsar.broker.PulsarService$MockitoMock$1723093763.close(Unknown Source)
    at org.apache.pulsar.broker.auth.MockedPulsarServiceBaseTest.stopBroker(MockedPulsarServiceBaseTest.java:269)
    at org.apache.pulsar.broker.auth.MockedPulsarServiceBaseTest.internalCleanup(MockedPulsarServiceBaseTest.java:213)
    at org.apache.pulsar.broker.admin.NamespacesTest.cleanup(NamespacesTest.java:196)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.13/Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.13/NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@11.0.13/Method.java:566)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
    at org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:61)
    at org.testng.internal.ConfigInvoker.invokeConfigurationMethod(ConfigInvoker.java:366)
    at org.testng.internal.ConfigInvoker.invokeConfigurations(ConfigInvoker.java:320)
    at org.testng.internal.TestInvoker.runConfigMethods(TestInvoker.java:701)
    at org.testng.internal.TestInvoker.runAfterGroupsConfigurations(TestInvoker.java:677)
    at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:661)
    at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
    at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
    at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
    at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
    at org.testng.TestRunner$$Lambda$183/0x00000001002a2840.accept(Unknown Source)
    at java.util.ArrayList.forEach(java.base@11.0.13/ArrayList.java:1541)
    at org.testng.TestRunner.privateRun(TestRunner.java:764)
    at org.testng.TestRunner.run(TestRunner.java:585)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
    at org.testng.SuiteRunner.run(SuiteRunner.java:286)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
    at org.testng.TestNG.runSuites(TestNG.java:1069)
    at org.testng.TestNG.run(TestNG.java:1037)
    at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:135)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeLazy(TestNGDirectoryTestSuite.java:123)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:90)
    at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:146)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
    at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

Full thread dump in jstack.review analyzer

lhotari commented 2 years ago

I it seems to be a leak in the "mutex" lock / unlock in MockZooKeeper

the thread dump contains this type of stacks too:

"metadata-store-2857-1" #4674 prio=5 os_prio=0 cpu=17.87ms elapsed=2969.28s tid=0x00007f2ac238b800 nid=0x5935 waiting on condition  [0x00007f2a69fe1000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
    - parking to wait for  <0x00000000c592e390> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.13/AbstractQueuedSynchronizer.java:885)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.13/AbstractQueuedSynchronizer.java:917)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.13/AbstractQueuedSynchronizer.java:1240)
    at java.util.concurrent.locks.ReentrantLock.lock(java.base@11.0.13/ReentrantLock.java:267)
    at org.apache.zookeeper.MockZooKeeper.delete(MockZooKeeper.java:825)
    at org.apache.zookeeper.MockZooKeeper.multi(MockZooKeeper.java:957)
    at org.apache.zookeeper.MockZooKeeper.multi(MockZooKeeper.java:936)
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.batchOperation(ZKMetadataStore.java:148)
    at org.apache.pulsar.metadata.impl.batching.AbstractBatchedMetadataStore.flush(AbstractBatchedMetadataStore.java:111)
    at org.apache.pulsar.metadata.impl.batching.AbstractBatchedMetadataStore$$Lambda$214/0x00000001003de440.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.13/Executors.java:515)
    at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.13/FutureTask.java:305)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.13/ScheduledThreadPoolExecutor.java:305)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)

   Locked ownable synchronizers:
    - <0x00000000c57f9248> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"metadata-store-zk-session-watcher-2858-1" #4675 prio=5 os_prio=0 cpu=0.15ms elapsed=2969.28s tid=0x00007f2ac2254800 nid=0x5936 waiting on condition  [0x00007f2a5eae8000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
    - parking to wait for  <0x00000000c592e390> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.13/AbstractQueuedSynchronizer.java:885)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.13/AbstractQueuedSynchronizer.java:917)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.13/AbstractQueuedSynchronizer.java:1240)
    at java.util.concurrent.locks.ReentrantLock.lock(java.base@11.0.13/ReentrantLock.java:267)
    at org.apache.zookeeper.MockZooKeeper.lambda$exists$15(MockZooKeeper.java:644)
    at org.apache.zookeeper.MockZooKeeper$$Lambda$325/0x0000000100500040.run(Unknown Source)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:321)
    at org.apache.zookeeper.MockZooKeeper.exists(MockZooKeeper.java:643)
    at org.apache.pulsar.metadata.impl.ZKSessionWatcher.checkConnectionStatus(ZKSessionWatcher.java:89)
    - locked <0x00000000c592e450> (a org.apache.pulsar.metadata.impl.ZKSessionWatcher)
    at org.apache.pulsar.metadata.impl.ZKSessionWatcher$$Lambda$216/0x00000001003ddc40.run(Unknown Source)
    at org.apache.pulsar.common.util.Runnables$CatchingAndLoggingRunnable.run(Runnables.java:53)
    at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.13/Executors.java:515)
    at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.13/FutureTask.java:305)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.13/ScheduledThreadPoolExecutor.java:305)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)
lhotari commented 2 years ago

I'll work on a fix.

lhotari commented 2 years ago

This was most likely a bug in my PR #13066 . I have addressed the lock leakage problem in the PR to prevent such issues in the future.

lhotari commented 2 years ago

Recent failure in https://github.com/apache/pulsar/runs/5036754890?check_suite_focus=true#step:10:35

"main" #1 prio=5 os_prio=0 cpu=216695.77ms elapsed=6825.27s tid=0x00007f6c84028000 nid=0xc19 waiting on condition  [0x00007f6c891cf000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
    - parking to wait for  <0x00000000ec09b138> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.13/CompletableFuture.java:1796)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.13/ForkJoinPool.java:3128)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.13/CompletableFuture.java:1823)
    at java.util.concurrent.CompletableFuture.join(java.base@11.0.13/CompletableFuture.java:2043)
    at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.close(LockManagerImpl.java:163)
    at org.apache.pulsar.broker.loadbalance.impl.SimpleLoadManagerImpl.stop(SimpleLoadManagerImpl.java:1447)
    at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:436)
    at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:367)
    at org.apache.pulsar.client.api.ClientDeduplicationFailureTest.shutdown(ClientDeduplicationFailureTest.java:133)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.13/Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.13/NativeMethodAccessorImpl.java:62)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.13/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@11.0.13/Method.java:566)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
    at org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:61)
    at org.testng.internal.ConfigInvoker.invokeConfigurationMethod(ConfigInvoker.java:366)
    at org.testng.internal.ConfigInvoker.invokeConfigurations(ConfigInvoker.java:320)
    at org.testng.internal.TestInvoker.runConfigMethods(TestInvoker.java:701)
    at org.testng.internal.TestInvoker.runAfterGroupsConfigurations(TestInvoker.java:677)
    at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:661)
    at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
    at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
    at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
    at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
    at org.testng.TestRunner$$Lambda$216/0x00000001003d4040.accept(Unknown Source)
    at java.util.ArrayList.forEach(java.base@11.0.13/ArrayList.java:1541)
    at org.testng.TestRunner.privateRun(TestRunner.java:764)
    at org.testng.TestRunner.run(TestRunner.java:585)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
    at org.testng.SuiteRunner.run(SuiteRunner.java:286)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
    at org.testng.TestNG.runSuites(TestNG.java:1069)
    at org.testng.TestNG.run(TestNG.java:1037)
    at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:135)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeLazy(TestNGDirectoryTestSuite.java:123)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:90)
    at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:146)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
    at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
lhotari commented 2 years ago

jstack.review shows a possible deadlock: https://jstack.review/?https://gist.github.com/lhotari/f2fa48e5c0800018032e460f75f1a8b4#tda_1_dump

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

lhotari commented 2 years ago

Another one in PulsarFunctionTlsTest.tearDown

"main" #1 prio=5 os_prio=0 cpu=43749.23ms elapsed=3506.78s tid=0x00007f965c026e90 nid=0xac2 waiting on condition  [0x00007f9660e26000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
    - parking to wait for  <0x0000100032218608> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@17.0.4.1/LockSupport.java:211)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.4.1/CompletableFuture.java:1864)
    at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.4.1/ForkJoinPool.java:3463)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.4.1/ForkJoinPool.java:3434)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.4.1/CompletableFuture.java:1898)
    at java.util.concurrent.CompletableFuture.join(java.base@17.0.4.1/CompletableFuture.java:2117)
    at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.close(LockManagerImpl.java:163)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.stop(ModularLoadManagerImpl.java:983)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerWrapper.stop(ModularLoadManagerWrapper.java:118)
    at org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown(PulsarFunctionTlsTest.java:193)

https://github.com/apache/pulsar/actions/runs/3044816917/jobs/4905919291#step:11:528

lhotari commented 2 years ago

full thread dump: analysis: https://jstack.review/?https://gist.github.com/lhotari/24168b03bf697c8e4fbe34ac0f795b08#tda_1_dump raw file: https://gist.github.com/lhotari/24168b03bf697c8e4fbe34ac0f795b08

lhotari commented 2 years ago

again

"main" #1 prio=5 os_prio=0 cpu=42816.91ms elapsed=3505.95s tid=0x00007f3190026f50 nid=0xad8 waiting on condition  [0x00007f3194df1000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@17.0.4.1/Native Method)
    - parking to wait for  <0x0000100025e10828> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@17.0.4.1/LockSupport.java:211)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.4.1/CompletableFuture.java:1864)
    at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.4.1/ForkJoinPool.java:3463)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.4.1/ForkJoinPool.java:3434)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.4.1/CompletableFuture.java:1898)
    at java.util.concurrent.CompletableFuture.join(java.base@17.0.4.1/CompletableFuture.java:2117)
    at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.close(LockManagerImpl.java:163)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.stop(ModularLoadManagerImpl.java:9[81](https://github.com/apache/pulsar/actions/runs/3110223307/jobs/5041516576#step:11:82))
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerWrapper.stop(ModularLoadManagerWrapper.java:118)
    at org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown(PulsarFunctionTlsTest.java:193)

https://github.com/apache/pulsar/actions/runs/3110223307/jobs/5041516576#step:11:79 in @codelipenghui PR build for #17804 . @mattisonchao do you have a chance to take a look?

lhotari commented 1 year ago

again

"main" #1 prio=5 os_prio=0 cpu=94687.75ms elapsed=3512.39s tid=0x00007efee002aba0 nid=0xbc8 waiting on condition  [0x00007efee61ec000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@17.0.5/Native Method)
    - parking to wait for  <0x0000100034016560> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@17.0.5/LockSupport.java:211)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.5/CompletableFuture.java:1864)
    at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.5/ForkJoinPool.java:3463)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.5/ForkJoinPool.java:3434)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.5/CompletableFuture.java:1898)
    at java.util.concurrent.CompletableFuture.join(java.base@17.0.5/CompletableFuture.java:2[117](https://github.com/lhotari/pulsar/actions/runs/3739152576/jobs/6346246451#step:11:118))
    at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.close(LockManagerImpl.java:163)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.stop(ModularLoadManagerImpl.java:982)
    at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerWrapper.stop(ModularLoadManagerWrapper.java:[118](https://github.com/lhotari/pulsar/actions/runs/3739152576/jobs/6346246451#step:11:119))
    at org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown(PulsarFunctionTlsTest.java:194)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.5/Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.5/NativeMethodAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.5/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@17.0.5/Method.java:568)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
    at org.testng.internal.invokers.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:69)
    at org.testng.internal.invokers.ConfigInvoker.invokeConfigurationMethod(ConfigInvoker.java:361)
    at org.testng.internal.invokers.ConfigInvoker.invokeConfigurations(ConfigInvoker.java:296)
    at org.testng.internal.invokers.TestInvoker.runConfigMethods(TestInvoker.java:823)
    at org.testng.internal.invokers.TestInvoker.runAfterConfigurations(TestInvoker.java:792)
    at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:768)
    at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:221)
    at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
    at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:969)
    at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:194)
    at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:148)
    at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
    at org.testng.TestRunner$$Lambda$245/0x0000000800dc16d0.accept(Unknown Source)
    at java.util.ArrayList.forEach(java.base@17.0.5/ArrayList.java:1511)
    at org.testng.TestRunner.privateRun(TestRunner.java:829)
    at org.testng.TestRunner.run(TestRunner.java:602)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:437)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:431)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:391)
    at org.testng.SuiteRunner.run(SuiteRunner.java:330)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1256)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1176)
    at org.testng.TestNG.runSuites(TestNG.java:1099)
    at org.testng.TestNG.run(TestNG.java:1067)
    at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:135)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112)
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeLazy(TestNGDirectoryTestSuite.java:[123](https://github.com/lhotari/pulsar/actions/runs/3739152576/jobs/6346246451#step:11:124))
    at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:90)
    at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:146)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
    at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:[126](https://github.com/lhotari/pulsar/actions/runs/3739152576/jobs/6346246451#step:11:127))
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

in https://github.com/lhotari/pulsar/actions/runs/3739152576/jobs/6346246451#step:11:117

lhotari commented 1 year ago

Fixed by #19055