Open zbentley opened 2 years ago
This is especially severe in our use case because many of our messages take a long time to process, so we parallelize by running many consumers and setting each one's receiver queue size to 1. However, this bug results in 50% or more of messages being artificially delayed even when idle consumers are available to handle them.
@zbentley It should be the same reason with https://github.com/apache/pulsar-client-python/issues/190, please try to disable the produce side batch.
@codelipenghui I have reproduced with batching both enabled and disabled.
Updated example code to print out batch index so it is clear that it is -1
.
@zbentley The pre-fetching is tied to the receive()
calls and not the acknowledgements.
As long as you keep calling receive, you are making more space in the internal receiver queue and the client library will ask for more messages to the broker. In this case, it is expected to be a race for consumer-2 to get the 2nd message prefetched.
To have the precise behavior of 1 single message, you need to disable pre-fetching completely, by setting receiver_queue_size=0
.
@merlimat that makes great sense, thanks. So "receiver queue size" could be better thought of as "extra messages to fetch (if available) whenever receive
is called"?
If so I can totally work with that; it would be great if that were documented more clearly. Documentation indicates that Pulsar uses a pull-based protocol, but doesn't go into detail about when pulls happen and how they work. It'd be great if the documentation for each client linked to a wiki article that explained:
receive
is an RPC, rather than something draining a local buffer with a background puller.message_listener
calls receive.I can take a stab at writing such an article if you'd like.
@zbentley https://pulsar.apache.org/docs/next/developing-binary-protocol#flow-control here is some context for this part.
The issue had no activity for 30 days, mark with Stale label.
how is it going on? i believe the code in multiple topic consumer set to minimum value 2
MultiTopicsConsumerImpl(PulsarClientImpl client, String singleTopic, ConsumerConfigurationData<T> conf, ExecutorService listenerExecutor, CompletableFuture<Consumer<T>> subscribeFuture, Schema<T> schema, ConsumerInterceptors<T> interceptors, boolean createTopicIfDoesNotExist) {
super(client, singleTopic, conf, Math.max(2, conf.getReceiverQueueSize()), listenerExecutor, subscribeFuture, schema, interceptors);
wonder if it can allow to set to 1 ? I also really need this as i have long running jobs so it will allow clients to concurrently run all the jobs .
Describe the bug
When the Python client calls
subscribe
, settingreceiver_queue_size
to a value of 1 results in an effective receiver queue behavior of more than one; Shared-type consumers withreceiver_queue_size=1
can "steal" messages that other consumers could process.To Reproduce
Shared
subscription on the topic.message-0
,message-1
and so on.Press a key to acknowledge messages
.Press a key to acknowledge messages
.Consumer code:
Expected behavior
The first consumer should process exactly 9 messages, every time. The second consumer should process exactly 1 message, every time. The extra messages that go to consumer 2 should go to consumer 1; otherwise, consumer 2 can "falsely steal" those messages.
Environment: Same environment as https://github.com/apache/pulsar-client-python/issues/190