Texera / texera

Collaborative Machine-Learning-Centric Data Analytics Using Workflows
https://texera.github.io
Apache License 2.0
163 stars 73 forks source link

Microsoft Orleans evaluation #645

Closed avinash0161 closed 5 years ago

avinash0161 commented 6 years ago

We want to evaluate the Orleans Actor system (https://dotnet.github.io/orleans/) and see if it is a good run-time engine for Texera.

avinash0161 commented 6 years ago

General Architecture

Orleans Architecture taken from  gigi.nullneuron.net

Terms and Concepts

The lifecycle of a grain is shown below: lifecycle of an Orlean grain

avinash0161 commented 6 years ago

Important links used in this evaluation

Companies/Projects using Orleans

Who is using Orleans

avinash0161 commented 6 years ago

Features of Orleans

// This stream is to deliver sms messages to a chat group with a particular GUID. 
// A use case is that a client can send messages to the group and the grain with this group_guid subscribes to this stream.

// A stream for forwarding sms messages
var streamProvider = GetStreamProvider("SMSProvider");
// Get the reference to a stream
var stream = streamProvider.GetStream<int>(group_guid, chatGroups);

Go here for how Orleans streams is superior to other stream processing engines like Apache Storm, Spark Streaming, Kafka etc. The following are use cases of Orleans streams:

avinash0161 commented 6 years ago

Open Questions


1. How does Orleans know that more actors need to be created? Suppose there are 80 actors running, but all are fully used, how will Orleans create a new actor?

Orleans doesn't create actors on its own. Say a client tries to get an actor (grain) of type UserProfile, and it calls the grain with ID which can be say the user's email id. Then, Orleans runtime creates a new grain with that ID in one of the silos.


2. Say I have already created a grain with ID 2 and some other client also creates a grain with ID 2, then the reference to the same grain will be returned. But that is not the other client wanted. It wanted a new grain.

We don't randomly assign IDs to grains. The IDs are something which can map to the real world. For example, for UserProfile grain, the ID is the user's email id. If the grain doesn't exist, it gets created. If it exists, it just gets activated. In both cases, the caller gets what it wanted.


3. What happens when a grain method is invoked in Orleans? Is it a straightforward method call with the caller's thread getting blocked? This stackoverflow post talks about method invocation in Orleans. This blog says that when a method is invoked, it is actually a message being sent. The message (the method's parameters) are deep-copied, serialized, transmitted to the correct silo, deserialized and then queued for processing by the receiving grain. Then, the method is invoked on the receiving grain. The message is passed asynchronously and thus the caller isn't immediately aware of the success/failure of method invocation. The method invocation results in a Promise which is implemented as a .Net Task. The Orleans runtime schedules work as a sequence of turns. A turn is the execution of a grain till a Promise has been returned (which happens when an await statement is reached, closure following await is reached or a completed or uncompleted task is returned). So, one request (method invocation) can result in several turns as the method can have await at many places. Now, as the grain is single threaded, only one turn executes at a time. But turns of different requests aren't interleaved. In fact, to maintain consistency, the runtime schedules all turns of a request before processing any other requests.

Now consider the case where a method (request) in grain-1 calls a method (request) of grain-2. grain-2 method will return a Task. Now, grain-1 can do two things after receiving the Task from grain-2. It can await the Task or it can continue without awaiting. If grain-1 does await on the Task, the current Turn on grain-1 gets over. However, the method isn't over. So, the grain-1 thread gets blocked as it can't process any other turns till it processes this method completely. So, any sub-Tasks spawned from grain code (for example, by using await or ContinueWith or Task.Factory.StartNew) which run the new Task in grain-context (which is the default case) will block the grain thread. If in rare cases, you want to escape this, you will have to start the new task using Task.Run() or endMethod which start the process in .Net ThreadPool Task Scheduler and not Orleans Task Scheduler. More on this here.


4. What happens when a client or a grain calls a method call to a grain? Does the caller block till the callee returns a task?

We did a few experiments for this. In the first experiment, we has a client call the grain every 1 second like below:

    while (1)
    {
        Console.WriteLine("Client giving another request");
        int grainId = random.Next(0, 500);
        double temperature = random.NextDouble() * 40;
        var sensor = client.GetGrain<ITemperatureSensorGrain>(500);
        Task t = sensor.SubmitTemperatureAsync((float)temperature);
        Console.WriteLine(t.Status);
        Thread.Sleep(1000);
     }

And the grain waited for 10 seconds before returning a Task. So, if this is a blocking call for the client, the next request will be made by the client only after 10s. But, that's not the case, as we see by the console output.

Client giving another request
Task Status - WaitingForActivation
500 outer received temperature: 32.29987
Client giving another request     <--------------------- client continues
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
Client giving another request
Task Status - WaitingForActivation
500 outer complete
avinash0161 commented 6 years ago

Design Questions

avinash0161 commented 6 years ago

Important coding snippets/practices (Things to be kept in mind)

public class MyGrainState
{
  public int Field1 { get; set; }
  public string Field2 { get; set; }
}

[StorageProvider(ProviderName="store1")]
public class MyPersistenceGrain : Grain<MyGrainState>, IMyPersistenceGrain
{
  ...
}
\\ Grain state write
public Task DoWrite(int val)
{
  State.Field1 = val;
  return base.WriteStateAsync();
}
\\ Grain state  refresh
public async Task<int> DoRead()
{
  await base.ReadStateAsync();
  return State.Field1;
}
avinash0161 commented 6 years ago

Hello World implementation on two machines

We will be using two Ubuntu servers to run a basic HelloWorld program. One machine will run the silo and other the client. We will provide MySQL to Orleans to maintain cluster membership. The code is a simple modification of the Gigilabs code from the link mentioned above. The client sends fake temperature readings to different grains which just outputs it to the terminal.

1. Install .Net Core on both machines

  1. Use this to install pre-requisite libraries.
  2. Use this to install both .Net SDK and runtime.

2. Start Silo on one machine

  1. Create a mysql database called 'orleanstest'. Then, run the scripts MySQL-Main.sql, MySQL-Clustering.sql to create the necessary tables and insert entries in the database.
  2. Use the following code snippet (entire code is in sandbox folder). Note that in the connection string you have to set username and password. Also, we won't be currently using ssl connections to mysql server. The below code starts the silo and waits unless some input is given to it:

            const string connectionString = "server=<hostname>;uid=<username>;pwd=<password>;database=orleanstest;SslMode=none";
    
            var silo = new SiloHostBuilder()
            .Configure<ClusterOptions>(options =>
            {
                options.ClusterId = "dev";
                options.ServiceId = "Orleans2GettingStarted2";
            })
            .UseAdoNetClustering(options =>
            {
                options.ConnectionString = connectionString;
                options.Invariant = "MySql.Data.MySqlClient";
            })
            .ConfigureEndpoints(siloPort: 11111, gatewayPort: 30000)
            .ConfigureLogging(builder => builder.SetMinimumLevel(LogLevel.Warning).AddConsole())
            .Build();
    
            await silo.StartAsync();
            // Wait for user's input, otherwise it will immediately exit.
            Console.ReadLine();

3. Start client on different machine

  1. Client uses the same mysql as the silo. So, the connection string should be the same.
  2. Use the following code to start client

            const string connectionString = "server=<hostname>;uid=<username>;pwd=<password>;database=orleanstest;SslMode=none";
            var clientBuilder = new ClientBuilder()
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2GettingStarted2";
                })
                .UseAdoNetClustering(options =>
                { 
                options.ConnectionString = connectionString;
                options.Invariant = "MySql.Data.MySqlClient";
                })
                .ConfigureLogging(builder => builder.SetMinimumLevel(LogLevel.Warning).AddConsole());
    
            using (var client = clientBuilder.Build())
            {
                await client.Connect();
    
                var random = new Random();
    
                while (true)
                {
                    int grainId = random.Next(0, 500);
                    double temperature = random.NextDouble() * 40;
                    var sensor = client.GetGrain<ITemperatureSensorGrain>(grainId);
                    await sensor.SubmitTemperatureAsync((float)temperature);
                }
            }
zuozhiw commented 5 years ago

Orleans evaluation is successful.