ASB transport using 'SendsAtomicWithReceive' mode cannot forward messages to the error queue when handler execution exceeds message lock duration #1043 #1053
When using the SendsAtomicWithReceive transaction mode and the handler execution time exceeds the message lock duration and renewal, the recoverability process cannot be executed properly. This is because during the recoverability process, a copy of the message is created and sent to the error queue, while the original message needs to be dequeued. However, if the message lock duration has expired, the original message cannot be removed from the queue because it has already been made available to other receivers for processing by the broker. As a result, the recoverability process gets stuck in an infinite loop, as the handler is unable to process the message before the lock duration expires.
Expected behavior
If the message handler always exceeds the message lock duration then the message should be moved to the error queue by the recoverability process.
Actual behavior
The message processing goes into infinite loop and the original message is not removed from the input queue while the error queue begins to fill up with the error message.
Steps to reproduce
Create a handler takes more than 5 minutes to complete (The default ASB message lock duration is 5 minutes)
Send a message to that handler
The endpoint will attempt to process the message, but after 5 minutes the message becomes visible again. At that point the message lock expires.
After the first processing attempt is complete, the transport will try to CompleteMessageAsync the message but a ServiceBusException will be raised with the reason being MessageLockLost leaving the message in the input queue.
In the meantime, another thread is going to pick up the message that is now visible (after the lock duration has elapsed).
This continues forever and the message will never be removed from the queue.
The log will say that the delayed retries will be scheduled, but because the delayed retry messages cannot be sent once the lock has expired the configured delayed retry policy will never be executed meaning the message will stay in the input queue forever and no delayed messages will occur, and no message will be sent to the error queue either.
Relevant log output
WARN Skip handling the message with id '{message ID}' because the lock has expired at '{time}'. This is usually an indication that the endpoint prefetches more messages than it is able to handle within the configured peek lock duration.
Additional Information
In the SendsAtomicWithReceive transaction mode, any outgoing operations that are associated with processing the incoming message are rolled back if the incoming message is not successfully processed. Therefore, using the LRU cache, like with the ReceiveOnly transaction mode, is not feasible with the SendsAtomicWithReceive transaction mode, as the handler never gets properly executed. In the ReceiveOnly Transaction mode if a message Id is found in the LRU cache, that indicates the message has already been handled, and any outgoing operations have already been executed and the message can be removed from the queue without having to invoke the message handler.
Describe the bug
Description
When using the SendsAtomicWithReceive transaction mode and the handler execution time exceeds the message lock duration and renewal, the recoverability process cannot be executed properly. This is because during the recoverability process, a copy of the message is created and sent to the error queue, while the original message needs to be dequeued. However, if the message lock duration has expired, the original message cannot be removed from the queue because it has already been made available to other receivers for processing by the broker. As a result, the recoverability process gets stuck in an infinite loop, as the handler is unable to process the message before the lock duration expires.
Expected behavior
If the message handler always exceeds the message lock duration then the message should be moved to the error queue by the recoverability process.
Actual behavior
The message processing goes into infinite loop and the original message is not removed from the input queue while the error queue begins to fill up with the error message.
Steps to reproduce
CompleteMessageAsync
the message but aServiceBusException
will be raised with the reason beingMessageLockLost
leaving the message in the input queue.Relevant log output
Additional Information
In the
SendsAtomicWithReceive
transaction mode, any outgoing operations that are associated with processing the incoming message are rolled back if the incoming message is not successfully processed. Therefore, using the LRU cache, like with theReceiveOnly
transaction mode, is not feasible with theSendsAtomicWithReceive
transaction mode, as the handler never gets properly executed. In theReceiveOnly
Transaction mode if a message Id is found in the LRU cache, that indicates the message has already been handled, and any outgoing operations have already been executed and the message can be removed from the queue without having to invoke the message handler.Workarounds
Increase lock-renewal to be greater than the duration of the handler multiplied by the prefetch count.