Azure / azure-functions-servicebus-extension

Service Bus extension for Azure Functions
MIT License
65 stars 36 forks source link

ServiceBusConnection has already been closed. Please create a new instance #114

Closed ThomasArdal closed 4 years ago

ThomasArdal commented 4 years ago

I have recently added Microsoft.Extensions.Logging to my Azure Functions. This reveals a range of different errors logged from the runtime as well as bindings. I'm creating GitHub issues when I see a new error, to try and understand if each error is something I should take seriously or not. I generally think that Azure Functions log way too many errors internally. The general answer here on GitHub is that "you can ignore that error". Logging errors that can simply be ignored is a bad practice IMO since important errors disappear in a large amount of logged errors.

For today's error, I'm seeing this:

System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'ServiceBusConnection has already been closed. Please create a new instance'.
   at Microsoft.Azure.ServiceBus.ServiceBusConnection.ThrowIfClosed()
   at Microsoft.Azure.ServiceBus.ClientEntity.ThrowIfClosed()
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.AbandonAsync(String lockToken, IDictionary`2 propertiesToModify)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.AbandonMessageIfNeededAsync(Message message)

For this function, I have a ServiceBus input binding like this:

public async Task Run([ServiceBusTrigger("%topicName%", "%subscriptionName%", Connection = "connection")]string mySbMsg)

The mySbMsg variable is parsed like this:

JsonConvert.DeserializeObject<UptimeResultMessage>(mySbMsg, new JsonSerializerSettings())

No further requests are made to Azure ServiceBus in the function.

This error seems to be something happening internally in either the runtime or the binding. Can this error be ignored? Should it even be logged?

Sorry for the semi-rant here, but when we are drowning in errors from the runtime, getting a good overview of the log is hard.

I'm running on .NET Core 3.1 on Azure Functions v3 and have the following NuGet packages installed:

<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="4.1.1" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="3.0.7" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.0.0" />
VamseeInala commented 4 years ago

Even I am seeing this error sometimes on calling the "CompleteAsync" on messageReceiver object.

paulbatum commented 4 years ago

@sidkri I have a feeling that this error is due to: https://github.com/Azure/azure-functions-servicebus-extension/issues/95

Does that seem likely?

sidkri commented 4 years ago

@paulbatum, the Azure/azure-functions-servicebus-extension#95 is needed to drain messages during graceful shutdown via "IHost.StopAsync()". It would be related if the errors reported here are happening while the function host is shutting down. @ThomasArdal / @VamseeInala - could you please check the logs and let us know if this error happens just before the function host stops?

ThomasArdal commented 4 years ago

I'm afraid I put an ignore rule on this error a long time ago. Happens over and over again and taking focus from real errors.

thelgevold commented 4 years ago

@sidkri @ThomasArdal @paulbatum I am seeing this error as well on a daily basis.

Based on your comments above I have gone through my logs to correlate the Cannot access a disposed object. Object name: 'ServiceBusConnection has already been closed error to a prior Stopping JobHost event.

In my case these events seem to always occur in tandem, following a pattern where Stopping JobHost is called < 1-2 seconds prior to seeing the ServicebusConnection error. In one case the difference in time was 11 seconds, but in most cases I see the ServiceBus error < 1 second after the JobHost stop event.

If this is the expected behaviour, would it be possible to mute this error since it seems to happen quite frequently?

sidkri commented 4 years ago

@thelgevold, thanks for checking your logs and confirming the behavior. It appears to be a timing/sequence issue during shutdown of the Functions host. I'll investigate to understand what our options are to avoid this or suppress and report back.

chrisseroka commented 4 years ago

I can observe the behavior while running app in Service Fabric

sidkri commented 4 years ago

This issue will be resolved when https://github.com/Azure/azure-functions-servicebus-extension/issues/95 is implemented.

sidkri commented 4 years ago

@ThomasArdal, @VamseeInala, @thelgevold the fix has been rolled out via 4.2.0 of the Service Bus extension: https://www.nuget.org/packages/Microsoft.Azure.WebJobs.Extensions.ServiceBus/4.2.0

We are initially requiring users to opt-in to the new behavior so please contact us via support to provide your function app names as we need to enable them in the system to activate the new behavior. It will only apply to situations where the function app is scaled-in. If you restart/stop the app or the worker is upgraded for maintenance, the issue can still occur though the chances are reduced with the new code. We are exploring options to expand support for draining messages in more scenarios.

slav-bukhal-gcm commented 3 years ago

@sidkri do we still need to opt-in for this fix, or is it generally available now?

sidkri commented 3 years ago

@slav-bukhal-gcm, I'm working on updating this issue with the latest guidance. At this time, we are working on rolling the feature out so its generally available but there are some additional changes needed that are taking time to roll out. I'll update here with more accurate guidance this week.

slav-bukhal-gcm commented 3 years ago

hi @sidkri any updates on the guidance?

lukasvosyka commented 3 years ago

Hi @sidkri ,

we are also facing the same issue that the Service bus connection is closed while calling CompleteAsync. In the logs I can also correlate Job Host shutdown happening at the same time like the error.

One thing we can and will do is to update the package to the latest version which is 4.2.1 by now. Any more things to do here?

Best, Lukas

sidkri commented 3 years ago

Thanks for your patience and apologies for the delay in updating this issue, we have some timelines around enabling the functionality on our service side that I can share now so I'd like to explain the differences between upgrading to 4.2.0+ of the Functions Service Bus extension and having the functionality enabled on the service side: 1) Version 4.2.0+ of the Functions Service Bus extension modifies how the connection to the Service Bus endpoint is managed when the Function app is shutting down. Starting with this version, when the extension is stopped by the Functions runtime (i.e. runtime is preparing to shutdown) no new Service Bus messages are processed (i.e. the Function will not be invoked with new messages). All in-flight Function invocations (already started but did not complete) will be allowed to complete before the connection to the Service Bus endpoint is closed. If the Functions runtime shuts down before in-flight function invocations complete, they will be interrupted and the messages they are processing will timeout and get abandoned by the Service Bus backend. The Functions runtime will wait for extensions to stop for around 5 to 10 seconds (depends on reason for shutdown) so this is how long in-flight Function invocations have to complete. Prior to 4.2.0, the connection to the Service Bus endpoint would close as soon as the extension was stopped by the Functions runtime - this resulted in all in-flight messages timing out and getting abandoned. 2) "Draining" of in-flight executions beyond 5 to 10 seconds (10 min for consumption SKU and 30 min for premium/elastic SKU) when a Function app scales-in requires changes on the platform side (in addition to the extension SDK change above). Please note that this only applies to situations where a Functions instance is shutting down due to a scale-in operation and not other scenarios like restart, deployments, platform upgrade etc. In these other scenarios the drain time will remain 5 to 10 seconds We expect the changes to be complete around the following rough ETAs: a) Elastic Premium SKU: Mid-April 2021 b) Consumption Linux SKU: End of June 2021 c) Consumption Windows SKU: Early Oct 2021

By upgrading to 4.2.0+ of the Service Bus extension, you get the benefit of reduced interruption to Function invocations as no new invocations start once the instance starts stopping and if function execution time is quick (< 5 sec) there will be no interruption at all. We plan to document this behavior as well.

ThomasArdal commented 3 years ago

Sounds good. I already upgraded all functions to 4.2.1 why I would probably not see these errors if removing the ignore filter I set up. I will try to play around with that.

Thank you for the update.

swettstein commented 3 years ago

@sidkri is there a way to track the completion of the platform changes? we're using mostly Consumption Windows plans and we'd like to track this feature so we can report it to our stakeholders.

sidkri commented 3 years ago

@swettstein I'll check if there another way besides this issue. I plan to update this issue when consumption windows is enabled though.

swettstein commented 2 years ago

@sidkri any update on Consumption Windows SKUs?

nzthiago commented 2 years ago

The platform has been updated and this feature is available for Windows and Linux on Consumption and Premium plans. Some notes:

@swettstein FYI.

Some optimization that doesn't affect the exiting feature remains during scale-out operations, and this issue was created to keep track of this work