Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 442 forks source link

Target Based Scaling not working with Flex Consumption Functions for Service Bus Single dispatch processing #10523

Open andynorrisjumar opened 1 month ago

andynorrisjumar commented 1 month ago

Have a set of functions that are all using Flex Consumption and are using Service Bus Topics (single message) as a trigger

Each function does the following

  1. Receive Message from Service Bus
  2. Update Azure SQL
  3. Either add another message to a service bus topic or not
  4. Complete Message

Under load Azure SQL was running out of DTU's due to the amount of concurrent functions running.

Followed guidance as per https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/azure-functions/functions-target-based-scaling.md#service-bus-queues-and-topics

Host.json for each function configured as follows

{ "version": "2.0", "logging": { "applicationInsights": { "samplingSettings": { "isEnabled": true, "excludedTypes": "Request" }, "enableLiveMetricsFilters": true } }, "extensions": { "serviceBus": { "maxConcurrentCalls": 1 } } }

Application configuration has the following configuration added:

          {
              "name": "WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT",
              "value": "1",
              "slotSetting": false
          }

No difference found in application scaling Azure SQL still running out of DTU's due to scale out of Azure Functions which seems to be ignoring configuration

Repro steps

Add 300 messages to initial SB topic

Expected behavior

Concurrency of service bus concurrency processing constrained due to configuration above

Actual behavior

SQL DTU running at 100% (S4 SKU used with 200DTU limit). Many concurrent functions running at same time observable in invocation logs

Investigative information

-Nuget Service bus extension Microsoft.Azure.Functions.Worker.Extensions.ServiceBus" Version="5.22.0"

Known workarounds

Only workaround is to limit rate of ingress into initial service bus topic

Related information

Provide any related information

nzthiago commented 2 weeks ago

@andynorrisjumar thank you for reporting this. To control the maximum scale out of Flex Consumption please check this documentation. By default, apps running in a Flex Consumption plan have limit of 100 overall instances. Currently the lowest maximum instance count value is 40. So to set the app to that lowest possible value of maximum scale you should use:

az functionapp create --resource-group --name --storage --runtime --runtime-version --flexconsumption-location --maximum-instance-count 40

With this, if you put 300 messages, with the concurrency you shared in your host.json, that would mean the app would scale to a maximum of 40 instances, with each instance handling 1 message at a time, and as each of those messages get processed the instances would pick the next message from the queue, until the 300 messages are done.

Or, if you are using ARM or Bicep, this setting is in maximumInstanceCount inside scaleAndConcurrency of the new functionAppConfig section.

Is this something you can test?

andynorrisjumar commented 2 weeks ago

Hi Thiago,

Thank you for the explanation and I think I can see where the problem lies (your maximum of 40 being the smallest value). We are set up using Bicep and have 40 set already for maximum.

In our use case have a single Service Bus Namespace and 5 topics. Each topic is backed by an Azure Flex Consumption Plan. Each function does the following.

  1. Peek Message from Queue
  2. Look up config in SQL.
  3. Process message and potentially create a new message for the next topic or end processing
  4. Complete the message.

The same SQL DB is shared across the entire system. The symptoms we were seeing is SQL exhausting connections and CPU.

So, using your explanation, you can see withing a few seconds from starting we would have been up to 200 concurrent connections and processing. Your choice of 40 as a minimum number of concurrent instances for flex-consumption does not really suit scenarios where we are trying to throttle concurrency as 40 is still quite a large number. As you can’t have multiple function projects to a flex consumption plan, shared resources across plans are likely to get impacted.

With the information you have given I can arrange a test to confirm, but believe it is the large minimum value of maximum-instance-count is the root cause here.

Regards

Andy Norris​​​​ Principal Architect T. 0121 788 4550<tel:0121%20788%204550> @. jumar.co.ukhttps://uk.content.exclaimer.net?url=https%3A%2F%2Fwww.jumar.co.uk%2F%3Futm_source%3Demail-signature%26utm_medium%3Demail%26utm_campaign%3Dweb-link&tenantid=hxOGJIrZEe6JJ2BFvdGTLg&templateid=4dfa95189bc5ee1185f96045bdc1af5d&excomponentid=bubGCExJhR3soNLsN2ET08VVIuHw1l9CjC_hcuBjseQ&excomponenttype=Link&signature=bFEzzzPcWqd05NWbHM5TMH3llnC9_06ScOo0lePToLvJ_aXA8CZF4Oj5SqbteV-l7q4kOn9gUq8HlSEo66IWEby9rMrpMIsAIP5zB5wvYMLYHymJQeciaUBle_wBVqjQSzbMqDR7ALPn4WXafVD3SKc8xFvAYSTNZEnfCGt8IqmkoPiyjQsLm9_AfBu69wEk0XEekS77jxbIu5aDyjEiZAsBY8UVlkRbxyre9_BPHv6Cy2zckL66yrvAspVYgsvNGIsR6hpvBev1EWyvRC0G3awe7ifCzKC0_d_0gX2ZfYYnXyN4BOXC040bvTSWv9ux06aQIrTQ50yvv0yMd1NnsA&v=1 [LinkedIn]https://uk.content.exclaimer.net/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fjumar-digital-services%2F&tenantid=hxOGJIrZEe6JJ2BFvdGTLg&templateid=4dfa95189bc5ee1185f96045bdc1af5d&excomponenttype=SocialMediaIcon&signature=zPQgGLIpXOye8Ecww4fQ3BOJj2C2schaUWM9pFwP5gdud0uoY-YB4ApTSsJnYBWjLYoVlVJQ2GT7fjNSYjDFo-64zxFm8m0QXqzxFTOst6HOrvktYhHwvYofnsUnrCOelAuj1kibQElzNRL8NGNTC_nuWPcvItaOSy0HZO71P-0I_Sa2gyg5YReW73awsPvO-vttA_KNaL_uOVfiWhi5jek-OpupnpWYdEpAPQ8WLHqIyGUqWO-JhuOzINAgmmpFvnIlvhFXPX7Kp3Vxjvz82Zlr6Ng62dLO7vgRMpQsh8X4nFZCj8uMUWouaUBtYea8BG_KQn7kERj03J5tPT19Lw&v=1 @. Jumar Solutions Limited (company number 02333415) and Jumar Technology Limited (company number 11786401) are each direct subsidiary of Jumar Holdings Limited (company number 10917342). Each of these companies are limited companies registered in England and Wales with their registered offices at Jumar House, Pinewood Business Park, Coleshill Road, Solihull B37 7HG, United Kingdom. This message and any attachments are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to us, and immediately and permanently delete it. Do not use, copy or disclose the information contained in this message or in any attachment. Although we have taken all reasonable precautions, email may be susceptible to data corruption, interception and unauthorised amendment - we do not accept liability for any such corruption, interception or amendment or the consequences thereof. For information about how we process data and monitor communications please see our Data Protection and Privacy Policy at https://www.jumar.co.uk/privacy-policy/https://uk.content.exclaimer.net?url=https%3A%2F%2Fwww.jumar.co.uk%2Fprivacy-policy%2F&tenantid=hxOGJIrZEe6JJ2BFvdGTLg&templateid=4dfa95189bc5ee1185f96045bdc1af5d&excomponentid=-7zlsYb7vHUSMP-MkMbF2rYvTY43vUCzVeDoJP14iUs&excomponenttype=Link&signature=ps-7a3oPRpRIugL2A9G-QW1O3HFodP8n-ROX3ySNQl3Zkz0ixOIm6vsh7ies4JCtOQiga0YHIwczjctKPbpjAlXM5-UyR5KgRJgvYJ6ZlGDrwsYB03iCuyUMdcH4gtGonKXhMViK_3FEZknjgYOsPCCVMiu5ua42qQxqkLtEZXzvDEntm-MogUmi_j96jlp5bmoLABbAt6-_2EOX4IssMbnvM1WB9_uEqgvn1Zn01Vmp4yToclqHbs8xLCZBWmTmhpKNI0HVdFNaXtsfOxpBJPJXiSlBYb5ihPANIeDHgaCcDaEX6kYHPU4bhm-u9Q7dPJ2t4zfVXCwqV9ZdRM7d9w&v=10501202402333415 @.*** Think before you print. Reduce your impact on the environment by choosing not to print this email.

From: Thiago Almeida @.> Sent: 06 November 2024 22:36 To: Azure/azure-functions-host @.> Cc: Andy Norris @.>; Mention @.> Subject: Re: [Azure/azure-functions-host] Target Based Scaling not working with Flex Consumption Functions for Service Bus Single dispatch processing (Issue #10523)

WARNING: This email originated from outside of Concept & Jumar. Do not click any links, open any attachments or action any request unless you trust the sender.

@andynorrisjumarhttps://github.com/andynorrisjumar thank you for reporting this. To control the maximum scale out of Flex Consumption please check this documentationhttps://learn.microsoft.com/en-us/azure/azure-functions/event-driven-scaling?tabs=azure-cli#flex-consumption-plan. By default, apps running in a Flex Consumption plan have limit of 100 overall instances. Currently the lowest maximum instance count value is 40. So to set the app to that lowest possible value of maximum scale you should use:

az functionapp create --resource-group --name --storage --runtime --runtime-version --flexconsumption-location --maximum-instance-count 40

With this, if you put 300 messages, with the concurrency you shared in your host.json, that would mean the app would scale to a maximum of 40 instances, with each instance handling 1 message at a time, and as each of those messages get processed the instances would pick the next message from the queue, until the 300 messages are done.

Or, if you are using ARM or Bicep, this setting is in maximumInstanceCounthttps://github.com/Azure-Samples/azure-functions-flex-consumption-samples/blob/351569ac1b20fb164f4ac06a620eba875873c95f/IaC/bicep/core/host/function.bicep#L68 inside scaleAndConcurrency of the new functionAppConfig section.

Is this something you can test?

— Reply to this email directly, view it on GitHubhttps://github.com/Azure/azure-functions-host/issues/10523#issuecomment-2460931356, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BDR523L3KJSGZ2I6U7IMCPLZ7KKUPAVCNFSM6AAAAABPYVZHB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQHEZTCMZVGY. You are receiving this because you were mentioned.Message ID: @.**@.>>

nzthiago commented 1 week ago

I understand. It is indeed on our backlog to allow this setting to go lower, but unfortunately this 40 as the lowest possible value for max instance count will remain for now. One thing to consider is having only one Flex Consumption app, limited to 40, but with five functions in that same app, each function triggering from a different topic. This would mean instead of possible 200 concurrent calls to SQL you would be limited to 40. Worth testing.