apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.08k stars 3.56k forks source link

RedeliveryCount is 0 on redelivery #18239

Open nakonczy opened 3 years ago

nakonczy commented 3 years ago

Describe the bug I have a pulsar server with a single topic and a message produced to the topic. It is a standalone server. I have a client app having a single consumer. Subscription type is Shared. When I receive the message, and then stop the client app before ack-ing, and then start the client app again, then the message is redelivered as expected, but with redeliveryCount=0 (expected 1). Similarily, when the consumer reconnects due to temporary lack of connectivity to pulsar server before ack-ing, then the redelivered message has the redeliveryCount=0.

To Reproduce Steps to reproduce the behavior:

  1. Receive a message.
  2. Block connectivity. I did it like this: sudo iptables -I INPUT -p tcp -m tcp --dport 6650 -j REJECT
  3. Acknowledge the message while there's no connectivity
  4. Wait longer than ping timeout (keepaliveIntervalMillis?, default is 30sec.)
  5. Unblock connectivity: sudo iptables -D INPUT -p tcp -m tcp --dport 6650 -j REJECT
  6. Reconnect will occur and message will be redelivered (as expected), but, it's redeliveryCount will be 0 (not 1).

OR:

  1. Receive a message.
  2. Stop the client app without ack-ing the message.
  3. Start client app again.
  4. Redelivery occurs with redeliveryCount=0.

Expected behavior Redelivered message's redeliveryCount value should be 1 instead of 0.

Desktop (please complete the following information):

codelipenghui commented 3 years ago

Hi @nakonczy , Currently only the client call redeliver messages(use redelivery method or enable ack timeout) will result in the message redelivery count increase. The redelivery count is introduced by the DLQ feature in Pulsar originally. If the connection broker also lead to the redelivery count increase, the consumer might never process the message but the message delivered to the DLQ.

asgeirrr commented 2 years ago

I can confirm this behaviour on Pulsar 2.8.1 using the Python client and shared subscription mode. My experiments suggest that the broken forgets redelivery counts for the given consumer name when the last consumer with that name disconnects. This IMHO prevents implementing reliable dead-letter policy. A message that always leads to OOM or an exception will never reach DLQ if there is just one consumer.

If my observations are correct, is there any way for the broker to save redelivery counts for some time even after the last consumer with that name disconnects? Thank you for any help or explanation.

zbentley commented 2 years ago

The feature @asgeirrr describes would be incredibly useful for us. One or both of two things (the first probably easier than the second to implement) would be ideal:

kuskmen commented 1 year ago

Hi all,

we are also looking for this feature to be implemented. Our use case is that we need to know deterministicly if a message is being redelivered due to nack or brand new.

MichalKoziorowski-TomTom commented 1 year ago

Hi @BewareMyPower.

I suppose it's not really related to C++ client. It happened also with Java client and Python client. I wonder why this should be handled at client side at all.

BewareMyPower commented 1 year ago

@MichalKoziorowski-TomTom It's because the original issue was tagged with client-cpp label, when I checked the stale issues in the original pulsar repo, I moved it here.

Now I moved it back to Pulsar and tagged with the correct labels.

MichalKoziorowski-TomTom commented 1 year ago

I understand now. Thank you.

mgagliardo91 commented 9 months ago

We are also hoping there can be progress on this issue - Any updates now that its been a year later?