Open TalkWIthKeyboard opened 1 year ago
There was a fix at the Java client: https://github.com/apache/pulsar/pull/15413. Maybe we need to include this fix at the upstream C++ client as well.
It seems that you have enabled the deduplication at the broker side? The key based batching still doesn't work perfectly with message deduplication.
There was a fix at the Java client: apache/pulsar#15413. Maybe we need to include this fix at the upstream C++ client as well.
It seems that you have enabled the deduplication at the broker side? The key based batching still doesn't work perfectly with message deduplication.
Txs, so if I need key based batching work with message deduplication, I should use java client now only?
Yes. But as I've said before:
The key based batching still doesn't work perfectly with message deduplication.
Take the example in https://github.com/apache/pulsar/pull/15413#issuecomment-1115325307, assuming there are 4 messages that were grouped into two batches:
After batch B is persisted, the sequence id will be updated to 2 at the broker side. If batch A failed to be sent and the producer resent batch A, msg-0 will be rejected and then message lost would happen.
Yes. But as I've said before:
The key based batching still doesn't work perfectly with message deduplication.
Take the example in apache/pulsar#15413 (comment), assuming there are 4 messages that were grouped into two batches:
- A: 0, 3 (i.e. messages whose keys are all "A", and the sequence ids are 0 and 3)
- B: 1, 2
After batch B is persisted, the sequence id will be updated to 2 at the broker side. If batch A failed to be sent and the producer resent batch A, msg-0 will be rejected and then message lost would happen.
Thanks, I got it. Does pulsar have a plan to implement a more powerful de-duplicate mechanism? It sounds like a bug or big limit for de-duplicate mechanism now.
AFAIK not. The limitation is from the key based batching. When it was introduced, the deduplication case was not considered well.
In my program, there will be eight producers, two of which will be sent to partitioned topics and six of which will be sent to unpartitioned topics.
Below is my topic configuration:
I am using
sendAsync()
to send messages, but I am getting many "Connection closed" messages after receiving the warning "Received send error from the server: Cannot determine whether the message is a duplicate at this time".I also found that I can remove
BatchingType.KeyBased
or turn off batching-mode without encountering this problem, and that the problem does not consistently occur.Here are my debug logs: pulsar_log.txt
pulsar-client version is 2.10.1 pulsar cluster version is 2.10.2