apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

[Bug] Topic "peek" (admin/v2/persistent/{tenant}/{namespace}/{topic}/ledger/{ledgerId}/entry/{entryId}) management API returns undocumented and inconsistent data when batch messages are accessed #20258

Open zbentley opened 1 year ago

zbentley commented 1 year ago

Search before asking

Version

2.10.3

Minimal reproduce step

  1. Produce a message containing a bytes-schema payload abc123 to a persistent topic with a non-batched (batchingEnabled=false) producer.
  2. Retrieve that message from the topic using the v2 admin API's GET admin/v2/persistent/{tenant}/{namespace}/{topic}/ledger/{ledgerId}/entry/{entryId} functionality.
  3. Observe that the returned payload exactly matches the string abc123.
  4. Produce the exact same message with batchingEnabled=true.
  5. Repeat step 2.
  6. Observe that the returned payload no longer matches abc123.

What did you expect to see?

Something I can use to extract individual messages from a batch. That could include any of the below ideas, or something totally different:

What did you see instead?

An undocumented blob of binary (I think this is a raw chunk of message data from the ledger) that looks like it contains some info re: properties/etc. up front, and then concatenated message+metadata entries for each message in the batch.

Anything else?

Many of my proposed fixes break backwards compatibility, so this may be better suited as a feature request.

However, in the short term, I'd love to find a reference on how to extract individual messages from the batch in a non-Java environment. I control all my admin API accesses in my environment, so I can add parsing logic to those wrappers--I just need to know how to parse the data.

Are you willing to submit a PR?

BewareMyPower commented 1 year ago

You're right. This API was introduced in https://github.com/apache/pulsar/pull/6331, at that moment the Pulsar community is not as active as it is currently, no PIP and discussion was required for such a new API. This API is really bad designed.

I just need to know how to parse the data.

Unfortunately, it needs much code work. We need to implement the same logic with https://github.com/apache/pulsar/blob/4678c36d4023a2bb8361e0a70673b96de33f06ac/pulsar-client-admin/src/main/java/org/apache/pulsar/client/admin/internal/TopicsImpl.java#L1237

and

https://github.com/apache/pulsar/blob/4678c36d4023a2bb8361e0a70673b96de33f06ac/pulsar-client-admin/src/main/java/org/apache/pulsar/client/admin/internal/TopicsImpl.java#L1424-L1425

In the short term, I suggest creating a reader and seek to a specific position that is specified by (ledger id, entry id, batch index) instead of depending on this admin API if you're using non-Java clients.

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.