Azure / azure-functions-servicebus-extension

Service Bus extension for Azure Functions
MIT License
65 stars 35 forks source link

Allow all in-flight Service Bus messages to complet during shutdown when function execution completes #143

Closed sidkri closed 3 years ago

sidkri commented 3 years ago

Prior to the changes in this PR, when a WebJob starts stopping, Service Bus messages continue to start processing but get abandoned as ServiceBusListener.Cancel() is called first which cancels the cancellation token source related to function invocation but does not stop the underlying Service Bus SDK from processing new messages. Additionally, the default MessageProcessor implementation throws an exception if the cancellation token involved in function execution is set which results in all in-flight messages getting abandoned even though they were successfully processed by the function. Both behaviors are modified as explained in the changes below.

Changes:

Pull request checklist

alrod commented 3 years ago

How does the job host stop the listener? Does it call Cancel and then StopAsync? From the interface description it should just cancel "in process" operations https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Listeners/IListener.cs#L32

sidkri commented 3 years ago

How does the job host stop the listener? Does it call Cancel and then StopAsync? From the interface description it should just cancel "in process" operations https://github.com/Azure/azure-webjobs-sdk/blob/dev/src/Microsoft.Azure.WebJobs.Host/Listeners/IListener.cs#L32

Per the interface description "Cancel any in progress listen operation" we should just cancel in-flight executions but listening will continue and it's not clear what should happen with new messages. The issue in this extension is that after Cancel() is called even new messages that start processing after the call will have a cancelled token. This is appropriate from the perspective of notifying the function being executed that execution may get interrupted/cancelled at any time after Cancel() is invoked. The problem is that the Service Bus extension checks the cancellation token while binding as well as when execution completes and in both cases throws an exception if the cancellation token is set, presumably so that the message is abandoned instead of completed as execution may get interrupted (ServiceBusBinding) or might have been skipped in the function (MessageProcessor). Another pattern we can follow is to simply notify the function code via the cancellation token that execution may get cancelled and let it decide to abandon the service bus message by throwing an exception (when autoComplete is enabled). Remove the check and throw logic in the two places I linked above could be a better change. I opted for modifying the behavior of ServiceBusListener:Cancel() to align with that of the EventHubsListener.Cancel() which simply calls StopAsync().

alrod commented 3 years ago

Adding some conversation for history:

[4/8 6:14 PM] Alexey Rodionov I thought more about the change

why do we remove cancellationToken.ThrowIfCancellationRequested(); in MessageProcessor and SessionMessageProcessor and leave them in other places? Maybe we should also leave them so on after dispose with not get disposed object exception on message comlete/abandon? If now the token is canceled on dispose, will the customer have the same issues as we still cancelling the token?

​[4/8 6:50 PM] Sid Krishna

1 I need to remove them from processors as Drain mode manager cancel's the token

​[4/8 6:51 PM] Sid Krishna

2 dispose is called at the end, cancel happens at the start of shutdown

​[4/8 6:51 PM] Sid Krishna the tests are manually creating an IHostedService and stopping it ​[4/8 6:51 PM] Sid Krishna WebJobs itself has a different lifecycle ​[4/8 6:52 PM] Sid Krishna I'll validate #1 again, the token that drain manager cancels is an further down the stack so might not affect message processor actually ​[4/8 7:48 PM] Sid Krishna I tested that the cancellation token does not get set when draining at the message processor level (your #1) so those removals are actually not needed so I reverted those changes and pushed. ​[4/8 7:51 PM] Sid Krishna That leaves your #2 which I think is not an issue as by the time the object is getting disposed the process is close to stopping so impact will be minimal if at all v/s when Cancel is invoked for which there is a long delay between that and process stopping.