timgabrhel commented 3 years ago

Check for a solution in the Azure portal

Nothing found.

Investigative information

This is a duplicated bug report of https://github.com/Azure/azure-functions-servicebus-extension/issues/116

The issue I am having is that my function with a ServiceBusTrigger goes dark for an extended period of time. In the linked bug report, there is a 12 hour gap of silence, then it caught up with all messages. Then, after a short period of normal behavior, it went silent again. I was able to force it to trigger by initiating a new deployment via Azure DevOps.

Please provide the following:

The information provided here is not referring to a problem invocation. I'm hoping you can use this information to trace into the "dark" periods.

Timestamp: 10/13/2020, 19:36:13
Invocation ID: 2fb23b2e-8037-4e18-8538-be9c3efbc0b6
Region: East US

The periods of silence were (all in Central time)

10/11 8:30am - 5:30pm 10/11 8:30pm - 5:30am (next day) 10/13 12:30pm - 2:15pm

Expected behavior

The ServiceBusTrigger should be firing automatically when new messages are available in the queue.

Actual behavior

The ServiceBusTrigger works for a period of time, then begins to stop triggering automatically. The first noticed period was 12 hours of silence, then it fired on it's own and caught up with all messages in the queue. Soon after, the message count rose and the functions didn't execute. There were no Application Insights logs to correlate this rising queue count. At the time of this writing, I just had this issue happen again, where the function didn't execute for around an hour. Triggering a new deployment via AzureDevOps caused the function to fire.

Known workarounds

Triggering a new deployment via AzureDevOps caused the function to fire.

Related information

Source

```csharp [FunctionName("NotificationProcessor")] public async Task Run([ServiceBusTrigger(ServiceBusNotificationsQueueName, Connection = ServiceBusConnectionName)] Message message, MessageReceiver messageReceiver, ILogger logger) { try { // read the message & process. based on the results, call abandon or complete. // DoWork(); if (success) { await messageReceiver.AbandonAsync(message.SystemProperties.LockToken, modifiedProperties); } else { await messageReceiver.CompleteAsync(message.SystemProperties.LockToken); } } catch (Exception ex) { await messageReceiver.AbandonAsync(message.SystemProperties.LockToken); logger.LogError(ex, $"NotificationProcessor Error processing {message.MessageId}"); throw; } } ```

timgabrhel commented 3 years ago

This issue was created at the request from this SO post https://stackoverflow.com/questions/64318870/servicebustrigger-delayed-in-executions

jesuissur commented 3 years ago

We got the same problem here. The messages were enqueued for about 5 hours before being processed by our service bus triggered function. It is a sporadic problem but pretty critical for production environment.

Our function app is "Always on" per Azure configuration settings. It is hosted in a premium/production windows App service plan (P2V2). We are on SDK version 3.0.9.

What can we do to help to find and, fix the problem or establish a solid workaround?

Thanks Phil

timgabrhel commented 3 years ago

@jesuissur Hey Phil, I ended up working with technical support on this and never found a specific issue I can point to. We did identify some instability in my functions app where there were a couple of invalid CRON configs for some timer apps, as well as some exception issues in other apps that were just guessing may have caused some instability in the function host causing a shutdown.

I see I've had 1 delay in the last 30 days so it's definitely not cleared up yet.

soninaren commented 3 years ago

@timgabrhel is the app configured to use runtime scaling?

timgabrhel commented 3 years ago

@soninaren Can you provide me a resource to have a look? I'm having a hard time in research to give you a confident answer.

soninaren commented 3 years ago

@timgabrhel To check the configuration for runtime scaling. open the azure portal -> <Your function app> -> Configuration -> Function runtime settings

timgabrhel commented 3 years ago

I'm on a consumption plan and I'm not seeing the option.

amyhu-ms commented 3 years ago

Is there any update on this issue? We are seeing similar issues last week that it took several hours from messages enqueued to being processed by our service bus triggered function. We are using Listen only access to the service bus due to business needs. Please help share any thoughts that if this may/may not cause hours delay in message being picked up.

julia-xu commented 3 years ago

Also seeking updates on this issue

jesuissur commented 3 years ago

We got the problem in our environment once again. The symptoms are a bit different though. Our function stopped processing messages (we got more than 10K in the triggered queue) and refuse to work even after many restart and deployments. The function was fine until then. It has been more than 24 hours without any processing except a few messages (50 at max) that have gone through the function for a short period of time (~15 minutes).

When we restart the function, the traces indicates several processing triggered by the service bus but nothing ever complete (no "executed" trace message).

Some facts that might help:

We scale the function up to 5 instances every night and back to 1 in the day
From the 10K messages that are stucked, we have several with a weird lock time pattern. The message was delivered 3-4 times and enqueued around 06:52 on dec. 15th. But the message is locked until 05:09 on dec. 16th. Almost a full day between the enqueued and the latest lock expiration after 3-4 retries. The lock duration on this queue is the default: 1 minute. Here's an example of a message we peek form the queue around 10:30 UTC dec. 16th. { "deliveryCount": 4, "enqueuedSequenceNumber": 0, "enqueuedTimeUtc": "2020-12-15T06:52:33.481241Z", "expiresAtUtc": "9999-12-31T23:59:59.9999999", "forcePersistence": false, "isBodyConsumed": false, "lockedUntilUtc": "2020-12-16T05:09:53.6549114Z", "lockToken": "4e852f23-b3f9-480b-b23d-97babbfc5e64", "messageId": "213dafa8127d4867bad1d357e5d592ef", "partitionKey": null,
Our function trigger is configured with those and we run in our own App Service Plan (premium) "functionTimeout": "02:00:00", "extensions": { "serviceBus": { "prefetchCount": 0, "messageHandlerOptions": { "autoComplete": true, "maxConcurrentCalls": 80, "maxAutoRenewDuration": "02:00:00" } } },
When we go to the function url (ex.: https://our-function-1.azurewebsites.net/) it does not answer (500 - Timeout). But we got another function (same code but "watching" another queue) on the same App service plan and we can go to the url and see the Azure Functions landing page

We will try to upgrade from the Microsoft.Azure.WebJobs.Extensions.ServiceBus 4.1.2 to the 4.2.0 version.

In hope it could help this major problem. Phil

jesuissur commented 3 years ago

We still got the same problem with the version 4.2.0 and 4.2.1 of Microsoft.Azure.WebJobs.Extensions.ServiceBus. Any updates on this one?

v-bbalaiagar commented 3 years ago

Hi @soninaren, Can you kindly look into this.

Ubloobok commented 1 year ago

I guess we just had a similar issue, when the message was in the service bus queue for 55 minutes (Blob Created event from Blob storage through Event Grid), and only after that Azure Function with ServiceBusTrigger consumed it. No more logs or activities. But it was only 1 case at this moment, we have not so many details.

francoishill commented 1 year ago

@timgabrhel, this is a known limitation with the ServiceBusTrigger when using the Consumption plan together with only the "Listen" policy (instead of the "Manage" policy). Refer to these docs but it comes down to the function scaler not being able to determine the queue length and then shutting down the whole app (if using Consumption plan).

I had the same issue and got it fixed by using the "Manage" policy for the connection string and removing the ;EntityPath=*** from the connection string.

For more details, I followed these steps:

Added a SCALE_CONTROLLER_LOGGING_ENABLED=AppInsights:Verbose environment variable on the function's configuration to get trace logs in Application Insights of scaling events (see docs).
- Your function needs to have Application Insights enabled, which can be found under its "settings" section for the function in Azure Portal.
Discovered (in trace logs of application insights) that the function does indeed shut down completely due to it not being able to determine whether there are more messages in the queue. This happened because my connection string had only the "Listen" policy and was also scoped to a single queue (because it had ;EntityPath=*** in the connection string).
Created a new connection string with "Manage" policy included and also removed the ;EntityPath=*** value in the connection string. If you generate a connection string on a queue/topic-subscription and manually remove the EntityPath from the connection string, the function will still only be able to access that single queue/topic-subscription (and not the entire service bus namespace).

I am seeing a similar problem. Sometimes, my service bus triggered function app will process messages as soon as the topic receives the message. But, in most cases, the message sits on the subscription for 30 minutes or more before the function is triggered.

ConfigurationSettings

ConfigurationSettings2

I've attached the function app settings, as well as an app insights trace showing the message being enqueued at 2:41 EST, and also showing that the service bus trigger isn't triggered by that message until 3:15 EST.

NotProcessedQuickly

andrewatfornax commented 1 year ago

Hi @mrobertsonavidxchange,

In a separate MS support ticket, I am being advised that the issue relates to the Function App service not ensuring that triggers are working because we've exceeded an undocumented maximum number of functions permitted per Function App resource.

Would you mind letting me know how many functions you have in your Function App, please? I'm curious to see if you also exceed this undocumented limit...

mrobertsonavidxchange commented 1 year ago

Just 2! A trigger and a health check @andrewatfornax

andrewatfornax commented 1 year ago

Okay, so, assuming what we're being told is correct, there are at least a couple of causes of the issue possible...

cygnim commented 1 year ago

@andrewatfornax, we are also experiencing this issue and when we turned on verbose logging, we did see a log entry that states the number of triggers that are monitored when a function shuts down is 100. This limitation, as far as we can tell, is not documented anywhere. I created a ticket to address this in the documentation here: https://github.com/Azure/Azure-Functions/issues/2372.

andrewatfornax commented 1 year ago

Thanks @cygnim - in my discussions with Azure support about this, we don't believe that we're hitting any kind of undocumented number of triggers that causes the functions to shut down - we do see a message that LOOKS like that's what's going on, but we've been assured this is not actually the cause of the issue.

It actually appears to be a kernel-level issue in the function app service.

While MS are still working on a solution, we've been advised that the function app team are working on improvements to service monitoring that will re-start the connection to the Service Bus in the event that the connection drops out, and processing of messages halts....

Azure / azure-functions-host

ServiceBusTrigger delayed for extended period of time before executing #6786

Check for a solution in the Azure portal

Investigative information

Expected behavior

Actual behavior

Known workarounds

Related information

I am seeing a similar problem. Sometimes, my service bus triggered function app will process messages as soon as the topic receives the message. But, in most cases, the message sits on the subscription for 30 minutes or more before the function is triggered.