Open FinVamp1 opened 5 years ago
There are several issues described in this bug. If I understand correctly, they can be summarized as follows:
Did I understand the concerns correctly? Were there any other concerns besides these?
Thank you Chris. I think these issues arise as a consequence of one larger concern. How do you determine for dedicated how many instances you may need to handle a parallel number of Orchestration calls which launch sequential activities? If you enable the HTTP Dynamic Throttling functionality then we'll return 429 if the counters exceed 80% which will give you a single instance throughput. What do you think?
The optimal number will depend on the workload itself. For example, if we're just talking about sequences and the activity functions are expected to be heavy in CPU usage, then you would probably want a number of VMs (and partitions) to equal the number of concurrent orchestrations you need to support.
For other workloads, I expect some amount of trial and error would be required. I agree we probably need some better guidance here though.
For Dedicated, HTTP Throttles will not work as we don't track the performance counters I think. https://github.com/Azure/azure-functions-host/blob/2b8c2b851e2a415d70b40e7d47bc415a0a82475a/src/WebJobs.Script/Environment/EnvironmentSettingNames.cs#L21
So from https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale#performance-targets if you want to support say 1000 Orchestrations per\sec then at 5 instances per second for a small you might need at least 100 instances if you're running on Large Premium instances. Does that sound right?
@FinVamp1 any more details on this? I am trying to diagnosis an issue where the CPU is spiking at 100% and not returning back down. Trying to figure out if this is related.
also, right now i am not using a dedicated storage account for the durables and we are also on a small app service plan.
@cgillum commenting off of what @FinVamp1 mentioned, is the document saying an A1 VM can only support 5 concurrent orchestrations at a time?
@mpaul31 The document is saying that you can expect a throughput of up to 5 activity functions per second on a single A1 VM running a single orchestration. The document isn't making any statement about orchestration concurrency on a single VM, except that you can configure your desired per-VM maximum concurrency through host.json settings.
hmmm how would you recommend planning out your VMs capacity? Unfortunately we are not able to use the consumption plan at the moment. Also, let's assume a single VM. Would it make sense to increase the partition size greater than the default or does that only come into plan when scaling out VMs?
I think testing will be required to determine the right VM capacity because the right number could vary quite a bit depending on the actual workload. One thing I can tell you, however, is that it's ideal to have a partition count greater than or equal to the VM count (having them be equal is the most optimal in terms of I/O costs).
Regarding high CPU, we have another issue tracking some high CPU issues that other customers have encountered. You may want to take a look at https://github.com/Azure/durabletask/issues/271.
Hi Chris
Does the new 1.8.1 release contain the fix for the infinite loop with message ordering?
On Apr 8, 2019, at 3:29 PM, Chris Gillum notifications@github.com wrote:
I think testing will be required to determine the right VM capacity because the right number could vary quite a bit depending on the actual workload. One thing I can tell you, however, is that it's ideal to have a partition count greater than or equal to the VM count (having them be equal is the most optimal in terms of I/O costs).
Regarding high CPU, we have another issue tracking some high CPU issues that other customers have encountered. You may want to take a look at Azure/durabletask#271.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Yes it does. Sorry for forgetting to call that out in the release notes. It was fixed by this PR: https://github.com/Azure/azure-functions-durable-extension/pull/701
OK no problem thanks man!
On May 4, 2019, at 11:20 PM, Chris Gillum notifications@github.com wrote:
Yes it does. Sorry for forgetting to call that out in the release notes. It was fixed by this PR: #701
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Describe the bug A clear and concise description of what the bug is. Please make an effort to fill in all the sections below; the information will help us investigate your issue.
Investigative information
If deployed to Azure
To Reproduce Steps to reproduce the behavior:
1) Take the Chaining Sample and Deploy to a V2 application. (Upgraded to 1.8.0) 2) Generate a VS Load test for 25 users and 30 minutes to http://fintestdurablestress.azurewebsites.net/orchestrators/E1_HelloSequence 3) The CPU will go to 100% and the initial startup will generate 404 errors 4) This app is configured with the Durable Task Extension in a separate Storage account from AzureWebJobsStorage. 5) These are the settings for the Durable Task Extension.
While the Orchestrations are under stress we see an increased number of 404 errors from Table Storage.
Time 3:35:25 PM Duration 4 ms Outgoing Command GET fintestdurablestorage/SampleHubVSInstances Result code 404 Category Function.HttpStart LogLevel Information InvocationId 70b05b11-fda7-472a-ac80-e81ec9417778 https://fintestdurablestorage.table.core.windows.net:443/SampleHubVSInstances(PartitionKey='394a269be60f42bc81ce72ff168ddead',RowKey='')?$select=ExecutionId%2CName%2CVersion%2COutput%2CCustomStatus%2CCreatedTime%2CLastUpdatedTime%2CRuntimeStatus%2CPartitionKey%2CRowKey%2CTimestamp%2CETag
Expected behavior
Actual behavior
Screenshots If applicable, add screenshots to help explain your problem.
Known workarounds Provide a description of any known workarounds you used. I will test again with extendedSessions disabled. https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale#orchestrator-function-replay
Additional context
Executing 'E1_HelloSequence' (Reason='', Id=f62fa09e-b783-4fd7-9656-d45e1d7c3f91) Executing 'E1_HelloSequence' (Reason='', Id=9586aea6-26db-492e-8876-18dcf0a14ead) Executing 'E1_HelloSequence' (Reason='', Id=d415f395-2516-4999-ba7e-7ee6d9e25920)