Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.17k stars 4.53k forks source link

ServiceBus trigger session issues causing active message count to spike. Trigger falls behind for ~ 1.5 hour #44612

Open iamsamcoder opened 2 weeks ago

iamsamcoder commented 2 weeks ago

Library name and version

Microsoft.Azure.Functions.Worker.Extensions.ServiceBus 5.18.0

Describe the bug

We have a servicebus trigger that appears to stall occaisionally and messages backup in the queue. It happened this morning for ~1h 30m. During this period the active message count rose to ~5k messages and normally it stays under 10.

By the time this situation was discovered, the function was starting to catch up. Nothing was done to resolve this backup of active messages, but it is not normal operation during this period for our service.

There was a gap of traces for ~45m during this active message back-up. There were only 4 exceptions for this period, which were all servicebus related exceptions.

Any suggestions on further investigation? This has happened on 6/13/24 and 6/14/24 and today 6/17/24. On 6/13 the function app was restarted twice to resolve the issue. projection-queue-back-log

Expected behavior

Don't expect the active messages to backup and function operation to stall for long period of time.

Actual behavior

On 3 occasions in last 5 days, the service bus triggered function has active messages backup and operations are degraded or stalled for 45+ minutes.

Reproduction Steps

Have not reproduced this issue. It has occurred 3 times in last 5 days in production environment.

Environment

No response

github-actions[bot] commented 2 weeks ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire commented 2 weeks ago

Hi @iamsamcoder. Thank you for reaching out and we regret that you're experiencing difficulties. In the majority of scenarios, a Function instance stalling is related to the Azure Functions infrastructure rather than associated with the trigger. Generally speaking, your best path forward for would be to open an Azure support request as that will allow you to engage the Azure Functions team who will have access to the platform logs and diagnostics for the period of time that you experienced the behavior.

That said, we would be happy to take a look at client logs from the trigger and offer thoughts, if you'd like. Because the trigger is a client-side construct, all of its logs are emitted in the scope of your application and would require that you had been collecting them during the time you experienced the behavior. For most applications, this flows into Applicaiton Insights using the default configuration. If you have these available and you'd like us to analyze, please share a 10-minute slice of client logs around the time the behavior was observed.

github-actions[bot] commented 2 weeks ago

Hi @iamsamcoder. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 1 week ago

Hi @iamsamcoder, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!