Optimise seeking by timestamp

Samreay commented 7 months ago

Search before asking

[X] I searched in the issues and found nothing similar.

Motivation

Right now it seems that seeking a reader or a consumer to a specific timestamp is an unoptimised process that can take many seconds / over a minute for larger topics (single GB data size, tens of messages per second). From a slack comment @lhotari it appears that seeking via a timestamp is not optimised, and I'm here to propose optimising it as a valuable feature.

Solution

Seeking currently works by message ID or by timestamp. I assume (though I could be wrong) that seeking by messageID is optimised. Without going into the implementation details properly and just spitballing ideas, something like binary searching on the time, or creating a treemap from timestamp to message ID (at any level of sparsity) might allow seeking to become far faster

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

[ ] I'm willing to submit a PR!

lhotari commented 7 months ago

The current implementation for seeking by timestamp is here: https://github.com/apache/pulsar/blob/30697bd382da0c5a4458f3a7c71d2c9c64ee6b63/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java#L727-L774 called from here: https://github.com/apache/pulsar/blob/c99a51d021a627d675697656869d418d416a5e1b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1890-L1936

I guess the missed optimization is to use the ledger metadata as a first level filtering. There's a binary search, but it doesn't use the ledger metadata: https://github.com/apache/pulsar/blob/82237d3684fe506bcb6426b3b23f413422e6e4fb/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpFindNewest.java#L83-L137

lhotari commented 7 months ago

LedgerInfo contains the timestamp when the ledger was sealed (it got closed or was rolled over): https://github.com/apache/pulsar/blob/23f46a0736e844a2a2fec943ee76d4e1e73ec477/managed-ledger/src/main/proto/MLDataFormats.proto#L55-L61

there could be an initial binary search which uses this information available in the ManagedLedgerImpl via https://github.com/apache/pulsar/blob/e6cd005f90524222df194a690718f77c4e646670/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L3844-L3846

I guess there is a gotcha since the ledger's timestamp is the broker's clock, but the seek uses the message publish time which is using the client's (publisher's) clock. There might be corner cases because of this.

lhotari commented 7 months ago

There's also a related issue #10488.

apache / pulsar