Azure / azure-service-bus

☁️ Azure Service Bus service issue tracking and samples
https://azure.microsoft.com/services/service-bus
MIT License
585 stars 781 forks source link

Managing delivery count when process is killed #410

Closed worldspawn closed 2 years ago

worldspawn commented 3 years ago

Description

I have a scenario where I am getting poison messages. The workload in these messages consume too much memory and kubernetes kills the pod its running in. The problem I am having is that the delivery count is never incremented on the message.

The consumer continues to lock the messages/sessions messages are in and I can't purge them. I have to shut down all consumers to get the poison out 😄

Incrementing the delivery count seems to be something the client does. Is there a way to explicitly do this in code? Ideally I would receive a message, bump the delivery count and then attempt to process the workload.

Actual Behavior

  1. Delivery count is never incremented
  2. Message is delivered infinitely

Expected Behavior

  1. Delivery count is incremented each time its... delivered
  2. Messages stops being delivered when delivery count is reached.
SeanFeldman commented 3 years ago

Incrementing the delivery count seems to be something the client does

Not quite. That's happening on the service/broker side. The problem is that the message is locked when the pod is killed and therefore the message lock is never released. The broker is obliged to wait until the lock is released and only then can increase the delivery counter. That's how PeekLock works. And a message will be re-delivered until the maximum delivery count is reached. At that point, it will be dead-lettered.

If you want the broker to re-deliver messages faster, you either need to shorten the lock duration (which might back-fire if your processing is not fast enough) or explicitly abandon messages in processing if you still in control. Otherwise, just wait till the lock expires.

worldspawn commented 3 years ago

In my case I'm getting a message asking me to process a PDF and this particular PDF is consuming more memory than my pod is allowed to use so the process is killed. At this point I want this message to go on the DLQ not be redelivered. As my code is completely unaware its being killed the mechanism I was hoping I could use is the delivery count.

Based on what you are saying I should see the delivery count increment after 2 minutes (my lock time). Maybe that was happening but I couldn't see for all the messages I had targeting this one bad PDF. I'll try it again with a max delivery count of 1. Thanks @SeanFeldman

worldspawn commented 3 years ago

I've been trying this out with a max delivery count of 1 and a lock time of 90 seconds. I'm not seeing any messages moving to the dlq after 15 minutes 😞 I've peeked all 214 messages and they have a delivery count of 1.

EldertGrootenboer commented 2 years ago

The behavior of when messages are moved to the DLQ are described at https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues#maximum-delivery-count. If you still see unexpected behavior after reading through this, please leave a comment here, and we can re-open the issue.