Azure / azure-functions-servicebus-extension

Service Bus extension for Azure Functions
MIT License
65 stars 36 forks source link

Service Bus bindings thrashing with runtime scaling #139

Closed cachai2 closed 2 months ago

cachai2 commented 3 years ago

A customer created a function with a connection to service bus through VNET integration using automation. They didn't check if the function was reachable once created. The service bus was unreachable, and the runtime kept trying to connect to it. Over one weekend, this accumulated 1.6 million executions and ended up costing the customer 400 euros.

It would be good to set a limit on number of retries potentially with an exponential back-off when the runtime is unreachable, so customers don't get overcharged.

mathewc commented 3 years ago

Need more details here - what exactly was the issue. The SB entity was unreachable due to invalid network/connection configuration? Meaning the polling operations weren't succeeding to query queue size, etc.? You mention "runtime is unreachable" - what specifically was unreachable?

cachai2 commented 3 years ago

Relevant github issue here. https://github.com/Azure/Azure-Functions/issues/1254#issuecomment-793182446

sidkri commented 3 years ago

@cachai, the issue you linked sounds different. In that one, the scale controller is not starting the function when there are messages in Service Bus due to a required application setting missing. Could you provide details like Function app name etc. for the specific issue you opened this issue about?

cachai2 commented 3 years ago

Apologies as I had a different issue in mind when linking. I'll follow-up with the Fast Track team who raised this issue for additional info.

andyblack19 commented 3 years ago

I have also just experienced this issue (I think, what the OP was getting at..)

We deployed a function with a ServiceBusTrigger binding to a Topic. The Topic name DID exist, but the subscriber name DIDN'T exist. e.g. [ServiceBusTrigger("existingTopic", "nonExistentSubscriber")] MyMessage msg

This caused 15 million exceptions to be logged to App Insights over just a 2hour period, with a significant associated cost. The exception thrown was: Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException

I'd expect this to have some sort of exponential backoff, or circuit breaker to avoid situations like this?

mathewc commented 3 years ago

@andyblack19 Are you using Azure Functions? If so, can you share your app name (publicly or privately) as well as a time range of when you saw this? Behind the scenes we're just creating a ServiceBus SDK MessageReceiver and registering a handler. That SDK controls the message polling intervals, and error handling for when the subscription doesn't exist.

andyblack19 commented 3 years ago

Hi @mathewc, Yes using Azure Functions. See the function execution details below so you can look up the app name.

bd1f6bcc-66cf-4468-a2b0-5570d53570ac 2021-09-04T07:49:22.100 UK South

A sample time range when this problem was occurring was: 2nd September, 2pm-2:30pm UTC. There were almost 5million of the MessagingEntityNotFoundException during this period.

Some exception context is below.

SDK version | azurefunctions: 3.1.3.0

The messaging entity '*REDACTED*:Topic:*REDACTED*|*REDACTED*' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.  TrackingId:a07be409-5682-450e-977f-110d642b7178_B27, SystemTracker:*REDACTED*:Topic:*REDACTED*|*REDACTED*, Timestamp:2021-09-02T14:24:24 TrackingId:e6a47de70ae7424592f54e7ca22de8a3_G43, SystemTracker:gateway7, Timestamp:2021-09-02T14:24:24

Message processing error (Action=Receive, ClientId=MessageReceiver2*REDACTED*/Subscriptions/*REDACTED*, EntityPath=*REDACTED*/Subscriptions/*REDACTED*, Endpoint=*REDACTED*.servicebus.windows.net)

Stack trace

Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException:
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<OnReceiveAsync>d__88.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<>c__DisplayClass65_0+<<ReceiveAsync>b__0>d.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.ServiceBus.RetryPolicy+<RunOperation>d__19.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.ServiceBus.RetryPolicy+<RunOperation>d__19.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<ReceiveAsync>d__65.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<ReceiveAsync>d__63.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.ServiceBus.MessageReceivePump+<<MessagePumpTaskAsync>b__12_0>d.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
mathewc commented 3 years ago

Yes, as I suspected, the error is coming from the ServiceBus SDK message pump, because you're pointing at a non-existent entity. Is there a reason why you're doing that? Was it a mistake/error? Because this is ServiceBus SDK behavior, if you were using the SDK directly without Azure Functions, you'd see the same results.

andyblack19 commented 3 years ago

Yes this was a mistake around the order of dependent deployments. Thanks for looking into it, I'll raise an issue in the ServiceBus SDK repo.

It would be great if the bindings were able to provision the subscription to a topic on demand, if it didn't already exist :)