Messages deadlettered during shutdown

andreasjl commented 3 years ago

We have the issue in our WebJobs (running on App Service) that lots of messages end up in the deadletter queue. In addition to that, we see lots of exceptions like these in our logs:

System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Threading.CancellationToken.ThrowIfCancellationRequested()
   at Microsoft.Azure.WebJobs.ServiceBus.MessageProcessor.CompleteProcessingMessageAsync(Message message, FunctionResult result, CancellationToken cancellationToken)
   at Microsoft.Azure.WebJobs.ServiceBus.Listeners.ServiceBusListener.ProcessMessageAsync(Message message, CancellationToken cancellationToken)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.MessageDispatchTask(Message message)

System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Threading.CancellationToken.ThrowIfCancellationRequested()
   at Microsoft.Azure.WebJobs.ServiceBus.Bindings.ServiceBusBinding.BindAsync(BindingContext context)
   at Microsoft.Azure.WebJobs.Host.Triggers.TriggeredFunctionBinding`1.BindCoreAsync(ValueBindingContext context, Object value, IDictionary`2 parameters) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Triggers\TriggeredFunctionBinding.cs:line 97
   at Microsoft.Azure.WebJobs.Host.Triggers.TriggeredFunctionBinding`1.BindAsync(ValueBindingContext context, TTriggerValue value) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Triggers\TriggeredFunctionBinding.cs:line 33
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstanceEx instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 261
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsync(IFunctionInstance functionInstance, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 94

In noticed that these exceptions happen always at the time when the WebJob is in state "Stopping" while shutting down. Therefore, I think the problem is related to the graceful shutdown behavior (probably still reading messages from Service Bus but passing a cancellation token that is set when shutdown is requested).

We had this issue with the latest nuget version 4.2.1. I could even repro the problem pretty reliably. Looking at the history I saw that 6da7fe7842aa0ac2e2ac35d490451bba1b59610f changed the shutdown code. Because of that I tried to downgrade to version 4.1.2. This code is not yet deployed to our production environment but I could not repro the problem anymore.

Repro steps

Extend the graceful shutdown wait period using settings.job.
Trigger some Service Bus messages.
While they are still being sent, stop the WebJobs in Azure Portal.

Expected behavior

Graceful shutdown, message processing is stopped when requested and messages are not ending up in deadletter queue without any exceptions

Actual behavior

Exceptions outside our code (see above) cause messages to end up in deadletter queue after the maximum number of delivery attempts

Known workarounds

Possibly downgrading to nuget 4.1.2 improves the situation.

Related information

Microsoft.Azure.WebJobs.Extensions.ServiceBus version 4.2.1

andreasjl commented 3 years ago

The workaround with downgrading does not help. Already following up offline with sidkri.

andreasjl commented 3 years ago

This issue has been fixed in nuget package 4.2.2.

Azure / azure-functions-servicebus-extension