dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

Does this make sense for Orleans or SF and if so guidance please #4308

Closed MattWorkWeb closed 6 years ago

MattWorkWeb commented 6 years ago

We’re working to take our software to Azure cloud and looking at Orleans and Service Fabric (SF) as potential frameworks. We need to:

1) Populate our analysis engines with lots of data (e.g., 100MB to 2GB) per engine instance. 2) Maintain that state, and if an engine instance goes idle for say 20 minutes or more, we’d like to unload it (i.e., and not pay for the engine instance resource). 3) Each engine instance will support one to several end users with a specific data set. 4) Each engine instance can be highly interactive generating lots of plot data near realtime. We’re maintaining state as we don’t want to pay the price to populate engine instance for each engine interaction. 5) An engine instance action can take a few seconds, a few minutes, to even tens of minutes. We’ll want some feedback. 6) Users may access an engine instance every few seconds (e.g., to steer the engine towards a result based on feedback) and will want live plot data. 7) Each user will want to talk to a specific engine instance. 8) As a user expresses interest in running a simulation (i.e., standing up an engine instance), ideally we want him to choose small/medium/large computing resource to run his engine instance (i.e., based on the problem he’s trying to solve he may want more or less computing/memory power).

We’re considering Orleans and SF but we’re having difficulty specifying architecture based on above requirements. We’ve considered:

1) Trying to think about an SF node, or an Orleans silo as an ‘engine instance’ described above. 2) Leveraging both Orleans and SF notion of fault tolerance through replication. 3) Leveraging local (i.e., to node or silo) storage to store results and maintain state (i.e., for long periods or until idle for 20 minutes).

We’ve not understood how to:

1) Limit a silo or a node to a single engine instance so that we can control resourcing of the engine instance. 2) Keep a user’s engine instance data separate from another users engine instance data.
3) Direct a request from a user (e.g., through a web API) to a particular engine instance.

Does this make sense for Orleans, does it make more sense for SF? Any pointers on how to implement the above would be helpful.

ReubenBond commented 6 years ago

Hi @MattWorkWeb,

Could one host hold multiple analysis engines? It might be that you could associate each engine instance with a grain. Is memory the only constraint, or do each of these engines require significant CPU resources, too?

If you know the size of an engine ahead of time, you could potentially use a custom placement director to select the silo with the most available memory to place each new instance on.

  1. Limit a silo or a node to a single engine instance so that we can control resourcing of the engine instance.

Silos aren't really designed for hosting one thing at a time, but you can use custom placement directors to control what gets placed on each silo.

  1. Keep a user’s engine instance data separate from another users engine instance data.

You could include the user's id in the grain id so that a separate grain is used for each user. Alternatively you could share one engine instance between users and include the user id in the the calls to the grain (or in the RequestContext which is passed with the call). I think I'd need to understand more about the scenario to give a recommendation, but either way is likely fine. How much data does each user have?

  1. Direct a request from a user (e.g., through a web API) to a particular engine instance.

If you're using grains to handle the request, the code on the API side might look like this:

// Get a reference to the specified engine.
// This doesn't perform any IO, that only happens when you use the reference.
var engineInstance = this.client.GetGrain<IEngineGrain>(engineId);

// This will call that grain, no matter where it is in the cluster.
// If the grain has not been activated yet, then it will first activate that grain.
// If you are using a custom placement director to determine where that grain should
// live, then that placement director will be asked where to activate the grain
return engineInstance.SomeQueryOrCommand(user, args);

How you determine engineId and create the authenticated user object is up to you. The IEngineGrain can perform authorization of the user. Alternatively you could perform AuthN & AuthZ in the Web API and not pass any representation of the user to the grain at all.

ReubenBond commented 6 years ago

Closing this for now. We can continue the discussion here or via another channel