apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.25k stars 3.58k forks source link

[Bug] org.apache.bookkeeper.mledger.ManagedLedgerException$CursorNotFoundException: ManagedCursor not found #23229

Open MagicalFool opened 2 months ago

MagicalFool commented 2 months ago

Search before asking

Read release policy

Version

pulsar 2.8.1

Minimal reproduce step

zk full GC image (2)

output image image (1) zk 挂掉后,导致 borker无法与bookie建立连接

What did you expect to see?

question1: why zookeeper full gc? question2: why broker not connect bookie?

What did you see instead?

I don't know to troubleshoot

Anything else?

No response

Are you willing to submit a PR?

lhotari commented 2 months ago

pulsar 2.8.1

This is an unsupported Pulsar release. Please upgrade to a supported version, https://pulsar.apache.org/contribute/release-policy/#supported-versions .

question1: why zookeeper full gc?

One source of this is that Pulsar's usage of ZK is very demanding since ZK isn't designed for large ZNode sizes. One possible source of the problem is the large byte array (byte[]) allocations in the implementation. When the heap gets fragmented, a full GC might be required. One possible way to mitigate this is with proper JVM tuning. With specific GC implementations there are different tuning challenges (some G1GC details). In Pulsar, there's an option since Pulsar 3.3 to use StreamNative Oxia to overcome the ZK limitations.

question2: why broker not connect bookie?

It's not possible to determine that based on the information. In any case, upgrading to a supported release is required for getting any fixes.