Open lhotari opened 1 month ago
2 other threads related to the dead lock, in TableViewLoadDataStoreImpl.removeAsync
method:
pulsar-load-manager-6391-1waiting to acquire [ 0x000010002025ce08 ] , holding [ 0x000010001e5511f0 0x00001000202630c0 ] at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.removeAsync(TableViewLoadDataStoreImpl.java) at org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl.doCleanup(ServiceUnitStateChannelImpl.java:1602) at org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl.lambda$scheduleCleanup$45(ServiceUnitStateChannelImpl.java:1357) at org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl$$Lambda$3727/0x00007f50392595c0.run(Unknown Source) at java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@17.0.13/CompletableFuture.java:1804) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(java.base@17.0.13/Thread.java:840)
broker-client-shared-internal-executor-6394-1waiting to acquire [ 0x000010002025ce08 ] , holding [ 0x0000100034a98bc8 0x00001005062490e8 ] at org.apache.pulsar.broker.loadbalance.extensions.store.TableViewLoadDataStoreImpl.removeAsync(TableViewLoadDataStoreImpl.java) at org.apache.pulsar.broker.loadbalance.extensions.reporter.TopBundleLoadDataReporter.tombstone(TopBundleLoadDataReporter.java:109) at org.apache.pulsar.broker.loadbalance.extensions.reporter.TopBundleLoadDataReporter.handleEvent(TopBundleLoadDataReporter.java:138) at org.apache.pulsar.broker.loadbalance.extensions.channel.StateChangeListeners.lambda$notify$3(StateChangeListeners.java:74) at org.apache.pulsar.broker.loadbalance.extensions.channel.StateChangeListeners$$Lambda$3849/0x00007f5038922d18.accept(Unknown Source) at java.util.concurrent.CopyOnWriteArrayList.forEach(java.base@17.0.13/CopyOnWriteArrayList.java:807) at org.apache.pulsar.broker.loadbalance.extensions.channel.StateChangeListeners.notify(StateChangeListeners.java:72) at java.util.concurrent.CompletableFuture$UniAccept.tryFire(java.base@17.0.13/CompletableFuture.java:718) at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.13/CompletableFuture.java:510) at java.util.concurrent.CompletableFuture.complete(java.base@17.0.13/CompletableFuture.java:2147) at org.apache.pulsar.client.impl.ConsumerBase.lambda$completePendingReceive$0(ConsumerBase.java:333) at org.apache.pulsar.client.impl.ConsumerBase$$Lambda$1932/0x00007f5038bd6358.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.13/ThreadPoolExecutor.java:1136) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.13/ThreadPoolExecutor.java:635) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(java.base@17.0.13/Thread.java:840)
TableViewLoadDataStoreImpl.removeAsync was made synchronized in #21777
"main" #1 prio=5 os_prio=0 cpu=67851.86ms elapsed=5366.03s tid=0x00007f52bc02f210 nid=0x33a0 waiting on condition [0x00007f52c21fb000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@17.0.13/Native Method)
It's odd that the test main thread cannot complete the table view close operation, which blocks other TableViewLoadDataStoreImpl operations.
Raised a PR to unsynchrnoize TableViewLoadDataStoreImpl operations. https://github.com/apache/pulsar/pull/23487
Search before asking
Example failure
https://github.com/apache/pulsar/actions/runs/11368523712/job/31653386805?pr=23468#step:10:616
thread dump: https://gist.github.com/lhotari/17557838cea2e4d4f4f1556fd4caec98 jstack.review analysis: https://jstack.review/?https://gist.github.com/lhotari/17557838cea2e4d4f4f1556fd4caec98#tda_1_dump
Exception stacktrace
Are you willing to submit a PR?