Open anthonychu opened 4 years ago
If orchestrator needs 1 minute to complete and we have 100 of them, I would expect them to be processed one by one, so in 1 minute 1 task completed and 99 not even started, in 2 minutes - 2 tasks completed etc
If I understand this correctly, the user wishes to allow just one orchestration to be in a state of running
at any one point in time.
Full disclosure, I've never tried this, but I think this can be achieved by setting "partitionCount" : 1
which will ensure all orchestrations will get processed by the same VM, thereby guaranteeing the effectiveness of setting "maxConcurrentOrchestratorFunctions" : 1
so that the next orchestration can't start until the previous one has reached a terminal status.
Additionally, the user could then control the concurrency up 16 concurrent orchestrations if he or she wished to do by increasing the partitionCount
property.
@anthonychu What do you think about @olitomlinson's suggestion? It seems reasonable to me. Perhaps we need another section in our Performance and Scale documentation which describes how to limit work like this?
I am also seeing something similar if I set partitionCount to 1, maxConcurrentActivityFunctions to 1, and maxConcurrentOrchestratorFunctions to 1. Then I take the durable function template which says hello to different cities and I place a Task.Delay for 30s in the SayHello activity function (changing it to async). If I kick off 15 instances of the orchestrator, the first instance takes 15min because it appears to execute a random activity function across all running instances to run at a time. It doesn't appear to favor activity functions from earlier instances. However, this would be very useful so that long running multi-part workflows complete in a reasonable amount of time and later requests wait.
@cgillum I'm having the same issue as Oleksandr. Setting maxConcurrentOrchestratorFunctions = 1. And/Or partitionCount = 1 does not help. I have an azure function with a ServiceBusTrigger that starts a new durable function whenever it receives a new message. If I send 10 messages to the queue, the serviceBusTrigger function will start 10 durable functions. Now all these 10 orchestrations will take turns to process their activities and sub-orchestrations. And they will all finish approximately at the same time. This is ok when receiving 10 messages a minute. But I sometimes get spikes of 500 messages in a minute. One orchestration takes about 20-45s to run by itself. That means I have to wait 500 * 30s ~ 4h to process the first message received.
So the maxConcurrentOrchestratorFunctions=1 does not keep me from having more than 1 orchestration in the "running" state at one time. And another note: Running activities and sub-orchestrations from the first orchestration should have priority until over the others until it is finished right?
Hmmm in that case, I’m really not sure what the purpose of ‘macConcurrentOrchestratorFunctions’ configuration is if it doesn’t limit how many are running?
Found this: https://github.com/Azure/azure-functions-durable-extension/issues/730 So it works as it was designed. It only limits number of orchestrations held in memory. But how can I limit the number of orchestrations running? I guess I could try to enable extendedSessions. And set it to 1 minute. Then they will count towards the maxConcurrentOrchestratorFunctions. And most will complete within 1 minute. Would be nice if the extend session timeout could be reset for each activity that completes within the orchestration, to allow the whole orchestration to complete before time runs out.
@ristaloff good find!
@cgillum I think the docs might need some clarification on this setting? The biggest thing for me is I didn’t realise that when an orchestration was awaiting, it wouldn’t count towards the limit.
—
Going back to the issue, I actually think there might be a user-code solution to this problem, by using an eternal orchestration as a coordinator of when to start other orchestrations which need to be processed serially, but I’d need to try it out.
Correct, the maxConcurrentOrchestratorFunctions
setting is meant to control the number of orchestrations that are active in memory at once. It cannot be used to serialize the execution of running orchestrations. I'll look into clarifying this in the documentation.
But yes, the way to accomplish this would be to use another orchestration to do the global enforcement. I haven't tested it, but something like this might work (C#):
[FunctionName("OrchestrationCoordinator")]
public static async Task CallOrchestrator([OrchestrationTrigger] IDurableOrchestrationContext ctx)
{
var startArgs = await ctx.WaitForExternalEvent<StartOrchestrationArgs>("StartOrchestration");
await ctx.CallSubOrchestratorAsync<object>(
startArgs.FunctionName,
startArgs.InstanceId,
startArgs.Input);
ctx.ContinueAsNew(null, preserveUnprocessedEvents: true);
}
I managed to run one orchestration at a time with the settings below. But It would also wait 90s after each orchestration was completed until the next would start.
"extensions": {
"durableTask": {
"extendedSessionsEnabled": true,
"extendedSessionIdleTimeoutInSeconds": 90,
"maxConcurrentActivityFunctions": 10,
"storageProvider": {
"partitionCount": 1
},
"maxConcurrentOrchestratorFunctions": 1
}
},
Also does Sub-Orchestrations count towards maxConcurrentOrchestratorFunctions ?
Yes, sub-orchestrations do count against this limit.
Hi, I tried to use an OrchestrationCoordinator function as proposed. When messages are received I raise an event on the OrchestrationCoordinator instance. It will then process the messages one by one. But, the raised events are only hold in memory. So by restarting app, we risk to loose all messages currently held in memory.
I think my only viable option now is to save messages to storage when they are received, and then have an eternal orchestrator or timertrigger to read unprocessed messages from storage and process them.
Seems to me that this could be a feature of the durable task framework.
@cgillum is the above true regarding the unprocessed events stored in-memory and not durably stored along with the orchestration state?
No, I don’t believe so. An app restart should never result in any data loss. Messages that are buffeted in-memory are still backed by Durable storage. The only case where I might expect messages to get dropped is if you do ContinueAsNew but don’t specify preserveUnprocessedEvents: true.
Any long term solution without the workaround suggested by @cgillum ?
Also context.ContinueAsNew(null, preserveUnprocessedEvents: true);
is not available in javascript v4 function it seems.
@shyamal890: can you please open an issue on the JS repo about this missing feature -> https://github.com/Azure/azure-functions-durable-js/? Thanks
Transferred from UserVoice for discussion.
Original suggestion from Oleksandr: