Azure / azure-functions-servicebus-extension

Service Bus extension for Azure Functions
MIT License
65 stars 35 forks source link

Delay between batches of messages #180

Closed smokedlinq closed 2 years ago

smokedlinq commented 2 years ago

We are seeing a delay after a batch of messages process that seems to be related to options.ClientRetryOptions.TryTimeout setting. The console output shows the request is executed successfully and the message is removed from the queue. However, it is not picking up any more messages until the TryTimeout delay is reached.

Note: I verified it's the TryTimeout setting by changing it from the default 1 minute to 10 seconds and observed the console output log showing processing after the 10 seconds.

info: Microsoft.Azure.WebJobs.Hosting.OptionsLoggingService[0]
      ServiceBusOptions
      {
        "ClientRetryOptions": {
          "Mode": "Exponential",
          "TryTimeout": "00:00:10",
          "Delay": "00:00:00.8000000",
          "MaxDelay": "00:01:00",
          "MaxRetries": 3
        },
        "TransportType": "AmqpTcp",
        "WebProxy": "",
        "AutoCompleteMessages": true,
        "PrefetchCount": 0,
        "MaxAutoLockRenewalDuration": "00:05:00",
        "MaxConcurrentCalls": 1,
        "MaxConcurrentSessions": 1,
        "MaxMessageBatchSize": 1,
        "SessionIdleTimeout": "",
        "EnableCrossEntityTransactions": false
      }

[2022-03-08T21:43:19.749Z] Executing 'AddOrUpdateCustomerHandler' (Reason='(null)', Id=2b5f5f1c-d8d3-489f-80d2-bec4289f080c)
[2022-03-08T21:43:19.751Z] Trigger Details: MessageId: 3dc56035b76b4360b0af15d7e2db95fd, SequenceNumber: 24, DeliveryCount: 1, EnqueuedTimeUtc: 2022-03-08T21:22:31.1000000+00:00, LockedUntilUtc: 9999-12-31T23:59:59.9999999+00:00, SessionId: 2aca3452-d442-3ea6-6731-9086bfde7019
[2022-03-08T21:43:21.933Z] Executed 'AddOrUpdateCustomerHandler' (Succeeded, Id=2b5f5f1c-d8d3-489f-80d2-bec4289f080c, Duration=2350ms)
[2022-03-08T21:43:23.509Z] Host lock lease acquired by instance ID '000000000000000000000000D01DAFEF'.
[2022-03-08T21:43:32.244Z] Executing 'AddOrUpdateCustomerHandler' (Reason='(null)', Id=d7d5944e-2621-47ef-a2b4-634a8a404512)
[2022-03-08T21:43:32.245Z] Trigger Details: MessageId: a74edd5a05a543a5b592985da1aa7765, SequenceNumber: 25, DeliveryCount: 1, EnqueuedTimeUtc: 2022-03-08T21:22:31.1160000+00:00, LockedUntilUtc: 9999-12-31T23:59:59.9999999+00:00, SessionId: 
[2022-03-08T21:43:32.828Z] Executed 'AddOrUpdateCustomerHandler' (Succeeded, Id=d7d5944e-2621-47ef-a2b4-634a8a404512, Duration=585ms)

Pretty standard ServiceBusTrigger:

[FunctionName(nameof(AddOrUpdateCustomerHandler))]
public virtual async Task RunAsync([ServiceBusTrigger("add-update-customer", IsSessionsEnabled = true, AutoCompleteMessages = true)] string message, CancellationToken cancellationToken)
{
}

Packages:

<PackageReference Include="Azure.Messaging.ServiceBus" Version="7.6.0" />
<PackageReference Include="Microsoft.Azure.Functions.Extensions" Version="1.1.0" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="5.2.0" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="4.1.0" />
smokedlinq commented 2 years ago

FYI we deployed this in our test environment and the function host scaler keeps adding nodes that only wake up in bursts. I'll be opening a support ticket tonight/in the morning.

Application insights live metrics:

image

Application insights requests metrics:

image

Service bus metrics for the queue:

image
smokedlinq commented 2 years ago

Support request ID-2203110010000495 if you want to follow along @soninaren

smokedlinq commented 2 years ago

I just tried a dotnet-isolated worker and I'm seeing similar but not the same behavior. It seems to process far more messages before pausing for the 1 minute delay. I suspect this is a difference in ServiceBusOptions, looks like the debug for dotnet-isolated is 2000 for session handlers.

ServiceBusOptions (dotnet-isolated):

{                                       
  "PrefetchCount": 0,                   
  "MessageHandlerOptions": {            
    "AutoComplete": true,               
    "MaxAutoRenewDuration": "00:05:00", 
    "MaxConcurrentCalls": 256           
  },                                    
  "SessionHandlerOptions": {            
    "AutoComplete": true,               
    "MaxAutoRenewDuration": "00:05:00", 
    "MaxConcurrentSessions": 2000,      
    "MessageWaitTimeout": "00:01:00"    
  },                                    
  "BatchOptions": {                     
    "MaxMessageCount": 1000,            
    "OperationTimeout": "00:01:00",     
    "AutoComplete": true                
  }                                     
}                                       

ServiceBusOptions (dotnet)

{                                           
  "ClientRetryOptions": {                   
    "Mode": "Exponential",                  
    "TryTimeout": "00:01:00",               
    "Delay": "00:00:00.8000000",            
    "MaxDelay": "00:01:00",                 
    "MaxRetries": 3                         
  },                                        
  "TransportType": "AmqpTcp",               
  "WebProxy": "",                           
  "AutoCompleteMessages": true,             
  "PrefetchCount": 0,                       
  "MaxAutoLockRenewalDuration": "00:05:00", 
  "MaxConcurrentCalls": 256,                
  "MaxConcurrentSessions": 8,               
  "MaxMessageBatchSize": 1000,              
  "SessionIdleTimeout": "",                 
  "EnableCrossEntityTransactions": false    
}                                           
smokedlinq commented 2 years ago

Didn't realize v5.x was in another repo; recreated the issue over there.

smokedlinq commented 2 years ago

SessionIdleTimeout needed a lower setting, doesn't work well when there are millions of session ids in use and one message per session is average.