apache / pulsar-client-python

Apache Pulsar Python client library
https://pulsar.apache.org/
Apache License 2.0
51 stars 42 forks source link

Reader set to MessageId.latest and inclusive start message does not work #193

Closed Samreay closed 6 months ago

Samreay commented 8 months ago

Hi team,

When using start_message_id_inclusive, we expect the message seeked to be returned. This is the case for the pulsar.MessageId.earliest, but does not work with pulsar.MessageId.latest.

Reproduction

First, run a pulsar standalone instance:

docker run -it -p 6650:6650 -p 8080:8080 --tmpfs /pulsar/data apachepulsar/pulsar:3.1.0 bin/pulsar standalone

Then, run this code:

import pulsar

host = "pulsar://localhost:6650"
topic = "example"

client = pulsar.Client(host)
producer = client.create_producer(topic)
producer.send(b"Hello world 1!")
producer.send(b"Hello world 2!")

earliest_reader = client.create_reader(topic, pulsar.MessageId.earliest, start_message_id_inclusive=True)
msg = earliest_reader.read_next()
print(msg.value())
# Prints Hello world 1!, as wanted

latest_reader = client.create_reader(topic, pulsar.MessageId.latest, start_message_id_inclusive=True)
msg = latest_reader.read_next(timeout_millis=5000)
print(msg.value())
# Times out

The behaviour of the earliest reader is as expected. But the latest reader should not time out.

RobertIndie commented 8 months ago

The latest means that you are reading the messages that are produced after the reader is created. It shouldn't work with start_message_id_inclusive.

Samreay commented 8 months ago

That seems incredibly unintuitive. If I say to someone "Hey, get me the latest message" I'd expect them to give me the latest message, not the first message produced after I asked.

Be that as it may, how should I go about consuming the last message in a topic with the python client then? Right now I'm seeking to a day in the past and just reading everything until the final message, which is incredibly wasteful.

treuherz commented 8 months ago

@RobertIndie That doesn't seem right. The Go client explicitly handles the case where startMessageIDInclusive && startingMessageID == latestMessageID. I would expect the Python client to match that behaviour or at least document that the functionality isn't working yet.

Samreay commented 8 months ago

Yeah looks like the Java client also checks to see if its latest and inclusive, and then seeks explicitly to the last message: https://github.com/apache/pulsar/blob/176bdeacd309e8c1e49358987a1946abd30ba34a/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L2362

RobertIndie commented 8 months ago

Thanks for all your information.

I would expect the Python client to match that behaviour or at least document that the functionality isn't working yet.

That makes sense to me.

RobertIndie commented 8 months ago

This issue is related to this C++ client issue: https://github.com/apache/pulsar-client-cpp/issues/385. I have pushed a PR to fix it: https://github.com/apache/pulsar-client-cpp/pull/386.

Hopefully it will be released in the next feature release of the Python client.