Create many grains and detect that all grains are finished?

ryanzll commented 5 years ago

Hello, distributed computing framework is rare with .Net, Orleans is excellent.

Our situation is that in Front-end, user click a button to start a computing task(it is not Task in .Net), which is split into many sub-tasks and each sub-task or several sub-tasks is a grains, the code may like below

// the count may be million, here use 10000 as example
for(int ii = 0; ii < 10000;ii++)
 {
    var elementCalculator = client.GetGrain<IElementCalculator>(ii);
    if (null == elementCalculator)
    {
        continue;
    }
    elementCalculator.Calculate("Good morning, Hello Grain!");
}

the question are:

Whether it is suitable to create so many grains. I‘m afraid so many will take soil down or reduce performance
How to detect all grains are finished job. After all sub-tasks are finished, then calculate summary value from values of sub-tasks. Is it suitable to waitall with so many Task?
Method "Calculate" of grains "IElementCalculator" may take long time to finish job, Should use external task like Task.Run?

Thanks

sergeybykov commented 5 years ago

10K grains is nothing extraordinary. Whether it makes sense to create that many grains or not is a different question, a design question. Depending on how large (number of CPU cores) the cluster is and how expensive each computation is, it may or may not make sense to start 10K parallel tasks. For example, if the cluster can execute no more than 100 parallel computations, it might be better, especially of those computations are stateless, to dispatch work to 100-200 calculator grains instead of 10K of them.
All grain methods are asynchronous - they return a Task. You need to await them to ensure the operation completed successfully and to handle any failures.
Yes, it is a recommended practice to off-load long running computations to the thread pool via Task,Run.

ryanzll commented 5 years ago

Thanks for your explanation @sergeybykov .

For question 1, client may known nothing about CPU core or other cluster resource that Soil should know. Is it possible for Orleans to store all the requests of Grains maybe a queue of Grains, then intelligently dispatch Grains to Soils according to cluster resource and load balance?

For question 2, some grains may execute fast, others may run slow, client must wait all grains finished. It may cause some Soils are idle while others are busy. Is it possible make it more effective? or should combine other technology like streaming, Message Queue?

For question 3, the Grain may be deactivated if external task run long time, it will be reactivated when task done. Right?

Could you share documents of architecture, design, implement of Orleans to help digging into source code of Orleans. It is appreciated to give complete examples or real projects with source code of using Orleans, it is steep curve to learn and use Orleans.

sergeybykov commented 5 years ago

For question 1, client may known nothing about CPU core or other cluster resource that Soil should know. Is it possible for Orleans to store all the requests of Grains maybe a queue of Grains, then intelligently dispatch Grains to Soils according to cluster resource and load balance?

If grains are just stateless processors of incoming requests, the most straightforward way I think is to leverage [StatelessWorker] grains for that. They will automatically scale with the number of CPU cores in the cluster, and will process requests on the gateway silo without a second network hop.

For question 2, some grains may execute fast, others may run slow, client must wait all grains finished. It may cause some Soils are idle while others are busy. Is it possible make it more effective? or should combine other technology like streaming, Message Queue?

If you have a steady stream of requests, they should get distributed across a large enough cluster pretty evenly. You can leverage the AsyncPipeline utility class to constrain the number of in-flight requests that each client sends to the cluster.

If you have a fixed number of requests to execute, then yes, there will naturally be a tail end of processing with some grains and silos done with their request while others are still finishing. With a random distribution of request across a large enough number of grains/silos, I don't think you will see too large of a skew in completion times. Of course, it depends on the level of non-uniformity of the requests.

For question 3, the Grain may be deactivated if external task run long time, it will be reactivated when task done. Right?

It depends on the implementation. If the external tasks were to call the grain at the end of its execution to notify about completion of the task, then the grain indeed would get reactivated.

Could you share documents of architecture, design, implement of Orleans to help digging into source code of Orleans. It is appreciated to give complete examples or real projects with source code of using Orleans, it is steep curve to learn and use Orleans.

I don't think we have much more to share other than what's already published on http://dotnet.github.io/orleans/ plus some public presentations about Orleans you can find on the web.

dotnet / orleans

Create many grains and detect that all grains are finished? #5439