apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.15k stars 3.57k forks source link

[Bug][broker] cursor will read in dead loop when do tailing-read with enableTransaction #22943

Open TakaHiR07 opened 3 months ago

TakaHiR07 commented 3 months ago

Search before asking

Read release policy

Version

client: pulsar-3.0.5 broker: pulsar-3.0.5

Minimal reproduce step

do txn produce and normal consume on a 200-partition topic by pulsar-perf. The throughput is 10MB/s, batchSize is 10, subscriptionType is exclusive. It is a tailing read, consuming the latest message

produce config is : -txn -nmt 1000 -time 0 -s 1024 -i 60 -bm 10 -b 1000 -bb 4194304 -r 10000 -mk random -threads 3

consume config is : -time 0 -i 60 -s sub_test_txn_p200 -ss sub_test_txn_p200 -sp Latest -ioThreads 1 -n 1

What did you expect to see?

cpu load is low

What did you see instead?

broker with little throughput but high cpu load

image image

Anything else?

This issue is proposed before but actually the issue still exist in the master branch . And it is a serious issue that result in transaction unavailable.

The root is :

In ManagedCursorImpl#asyncReadEntriesWithSkipOrWait, hasMoreEntries() only compare readPosition and lastConfirmedEntry. However, if we enableTransaction, maxReadPosition also decide whether we can read entry.

Currently, if readPosition < lastConfirmedEntry && readPosition > maxReadPosition. We can read entry immediately. But when enter internalReadFromLedger(), we will go into opReadEntry.checkReadCompletion(), and then trigger callback.readEntriesComplete()

Therefore, it would continue to read entry in dead loop, but actually there is no need to read entry.

https://github.com/apache/pulsar/blob/5dc030431a60b49e81d577cd06a1ae63dbee0293/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L934-L979

https://github.com/apache/pulsar/blob/5dc030431a60b49e81d577cd06a1ae63dbee0293/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2051-L2056

https://github.com/apache/pulsar/blob/5dc030431a60b49e81d577cd06a1ae63dbee0293/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpReadEntry.java#L164-L186

Are you willing to submit a PR?