Open larshp opened 1 year ago
cc @merlimat @BewareMyPower
Since the ledger id and entry id are not exposed to users (in pulsar-client-api
module), there is no need to document that.
Actually, the MessageId
objects returned by send
, receive
and getLastMessageId
are unique for the triple (ledger id, entry id, batch index). However, there is no public API to get these fields. You have to use the specific implementations like MessageIdImpl
and BatchMessageIdImpl
to access these fields. These implementations are very messy and might change. See more discussions here
I think its important to document the design of the software so that users can validate and understand the data and workings of the platform.
A user of the Python client will see https://pulsar.apache.org/api/python/2.10.x/pulsar.html#MessageId.__init__ and be exposed to the concepts of ledger and entry id
A user of https://pulsar.apache.org/docs/2.10.x/pulsar-admin/, will have to enter the ledger and entry id for some commands
That the Message ID is composed of fields referred in other places in the documentation/clients/CLI is unclear, plus which fields make up the unique part of it is unclear.
A user of the Python client will see https://pulsar.apache.org/api/python/2.10.x/pulsar.html#MessageId.__init__ and be exposed to the concepts of ledger and entry id
That's the point I mentioned in the mail list. Though the original authors that wrote the Java client don't want to expose these "details", authors of many other clients expose these so-called "details".
A user of https://pulsar.apache.org/docs/2.10.x/pulsar-admin/, will have to enter the ledger and entry id for some commands
It makes sense to me. I saw the get-message-by-id API just now and confused about it. I doubt if this API is reasonable.
A message could be stored in an entry that can be located uniquely by the ledger id and the entry id. However, a message could also be stored across multiple entries or multiple messages are stored in a single entry. I need to look deeper into this API's semantics.
Maybe it's worth opening another discussion about this topic. Or do you think it's better to continue discussing in https://lists.apache.org/thread/rdkqnkohbmkjjs61hvoqplhhngr0b0sd?
After checking the get-last-message-by-id
again, I found this is a very limited API that it can only get last message id of a non-partitioned topic. I changed my mind. We should expose these fields (so-called details) to users. I will open another discussion soon.
In short, this issue makes sense to me.
The issue had no activity for 30 days, mark with Stale label.
@BewareMyPower is the new MessageIdAdv
or related changes solve this issue?
@tisonkun No. Users don't need to know which fields define the uniqueness of a MessageId. If the MessageId
instances were retrieved from receive
are guaranteed to be different. There is also a public compareTo
method to compare two MessageId
instances.
I think the main problem of the issue is that many pulsar-admin APIs require users to provide the ledger id and the entry id. Unfortunately, we still need the batch index field for these APIs to locate a unique message.
First, in the documents, we can tell users how to retrieve these fields, it would be easy to do that with the help of MessageIdAdv
. (Maybe we can provide code examples, but let's wait 3.0.0 is released) Second, we should improve these APIs to accept the batch index.
The issue had no activity for 30 days, mark with Stale label.
Search before asking
What issue do you find in Pulsar docs?
In https://pulsar.apache.org/docs/next/concepts-messaging/
The Message ID is described as
In the binary protocol it is: https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto#L57-L67
Ie. the message ID is comprised of various fields, it is unclear which fields defines the uniqueness.
What is your suggestion?
Looking at https://pulsar.apache.org/docs/2.10.x/pulsar-admin/ it uses ledgerId & entryId for some of the commands, if this is exactly the unique part of the message id, it can be added to the documentation for clarity.
Any reference?
No response
Are you willing to submit a PR?