Closed cachai2 closed 2 months ago
Need more details here - what exactly was the issue. The SB entity was unreachable due to invalid network/connection configuration? Meaning the polling operations weren't succeeding to query queue size, etc.? You mention "runtime is unreachable" - what specifically was unreachable?
Relevant github issue here. https://github.com/Azure/Azure-Functions/issues/1254#issuecomment-793182446
@cachai, the issue you linked sounds different. In that one, the scale controller is not starting the function when there are messages in Service Bus due to a required application setting missing. Could you provide details like Function app name etc. for the specific issue you opened this issue about?
Apologies as I had a different issue in mind when linking. I'll follow-up with the Fast Track team who raised this issue for additional info.
I have also just experienced this issue (I think, what the OP was getting at..)
We deployed a function with a ServiceBusTrigger binding to a Topic.
The Topic name DID exist, but the subscriber name DIDN'T exist.
e.g. [ServiceBusTrigger("existingTopic", "nonExistentSubscriber")] MyMessage msg
This caused 15 million exceptions to be logged to App Insights over just a 2hour period, with a significant associated cost. The exception thrown was: Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException
I'd expect this to have some sort of exponential backoff, or circuit breaker to avoid situations like this?
@andyblack19 Are you using Azure Functions? If so, can you share your app name (publicly or privately) as well as a time range of when you saw this? Behind the scenes we're just creating a ServiceBus SDK MessageReceiver and registering a handler. That SDK controls the message polling intervals, and error handling for when the subscription doesn't exist.
Hi @mathewc, Yes using Azure Functions. See the function execution details below so you can look up the app name.
bd1f6bcc-66cf-4468-a2b0-5570d53570ac 2021-09-04T07:49:22.100 UK South
A sample time range when this problem was occurring was: 2nd September, 2pm-2:30pm UTC. There were almost 5million of the MessagingEntityNotFoundException during this period.
Some exception context is below.
SDK version | azurefunctions: 3.1.3.0
The messaging entity '*REDACTED*:Topic:*REDACTED*|*REDACTED*' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions. TrackingId:a07be409-5682-450e-977f-110d642b7178_B27, SystemTracker:*REDACTED*:Topic:*REDACTED*|*REDACTED*, Timestamp:2021-09-02T14:24:24 TrackingId:e6a47de70ae7424592f54e7ca22de8a3_G43, SystemTracker:gateway7, Timestamp:2021-09-02T14:24:24
Message processing error (Action=Receive, ClientId=MessageReceiver2*REDACTED*/Subscriptions/*REDACTED*, EntityPath=*REDACTED*/Subscriptions/*REDACTED*, Endpoint=*REDACTED*.servicebus.windows.net)
Stack trace
Microsoft.Azure.ServiceBus.MessagingEntityNotFoundException:
at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<OnReceiveAsync>d__88.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<>c__DisplayClass65_0+<<ReceiveAsync>b__0>d.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.RetryPolicy+<RunOperation>d__19.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.RetryPolicy+<RunOperation>d__19.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<ReceiveAsync>d__65.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.Core.MessageReceiver+<ReceiveAsync>d__63.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.ServiceBus.MessageReceivePump+<<MessagePumpTaskAsync>b__12_0>d.MoveNext (Microsoft.Azure.ServiceBus, Version=4.2.1.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
Yes, as I suspected, the error is coming from the ServiceBus SDK message pump, because you're pointing at a non-existent entity. Is there a reason why you're doing that? Was it a mistake/error? Because this is ServiceBus SDK behavior, if you were using the SDK directly without Azure Functions, you'd see the same results.
Yes this was a mistake around the order of dependent deployments. Thanks for looking into it, I'll raise an issue in the ServiceBus SDK repo.
It would be great if the bindings were able to provision the subscription to a topic on demand, if it didn't already exist :)
A customer created a function with a connection to service bus through VNET integration using automation. They didn't check if the function was reachable once created. The service bus was unreachable, and the runtime kept trying to connect to it. Over one weekend, this accumulated 1.6 million executions and ended up costing the customer 400 euros.
It would be good to set a limit on number of retries potentially with an exponential back-off when the runtime is unreachable, so customers don't get overcharged.