dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

Geo-distributed Actor Directory (prototype) #591

Closed jameskeongchen closed 9 years ago

jameskeongchen commented 9 years ago

Hi Folks,

I'm hoping to get a bit of clarity on the current state of the Orleans Project and future plans particularly around multi datacenter / multi region / geo-distribution.

Reading the current FAQ suggests that a single deployment is currently limited to a single data center.

However reading Philip Bernstein's recent presentation suggests there is a prototype for geo-distribution.

Recent media coverage also suggests:

The project introduces the concept of ‘grains’ as units of computation and data storage that can migrate between data centres.

Can anyone help provide some clarity on this?

  1. Can the cluster span multiple data centers?
  2. What is a effect of latency on this? (ie do the DCs need to be relatively close or can they be geo distributed?)
  3. Is the same eventual consistency concept applicable here?
  4. Particularly for the single writer / multi reader pattern #446 does this mean that regionally state can be geo-distributed but writes incur the latency to the location of the grain activation?
  5. Lastly if available or in prototype I'd love to start up a Geo-distributed prototype to play with - can I just follow the basics of the guide oulined in Connection Silos from Multiple Cloud services #525 but have those services in different regions like West US and East US and West Europe to start?

Appreciate your thoughts on this.

Regards,

James.

jthelin commented 9 years ago

/cc @philbe

sergeybykov commented 9 years ago

Hi James,

There are several related things here.

First of all, it is indeed technically possible to run a cluster of geo-distributed silos today. So long as the silos have direct TCP connectivity to each other and access to the cluster membership store, they will run just like in a local cluster, oblivious to the physical distances between them. Orleans runtime wouldn't do anything different from when running on a local cluster.

Considering cross-datacenter latencies and reliability of the network links, we think it's not a good idea to run a system with interactive traffic this way. However, there are likely scenarios that would probably work just fine in such a topology.

We've done some prototyping of geo-distribution in the past that made us think we can build a sensible solution. This summer we are running special project with Research on geo-distribution. So far it's a two-prone approach.

  1. Allow multiple Orleans clusters join into a geo-distributed "meta-cluster" with a general propensity for local placement of grains (within the same cluster as the caller) but with full connectivity between all grains in the system and a global single activation guarantee within the "meta-cluster".
  2. Running multiple activations of a grain, up to one per cluster, with eventual or strong (application driven) consistency of state achieved via storage.

We are right in the middle of this project. So everything is in flux, and no projection of what and when will be available can be made yet. We are very interested in bringing this kind of functionality into the Orleans codebase as soon as it is technically possible.

I hope but am not sure this answers your questions. If not, I'll be happy to discuss further.

jameskeongchen commented 9 years ago

Thanks Jorgen, Thanks Sergey.

Yes that goes a very long way to painting the picture for the plans of the Geo-distributed Actor Directory.

I do have a single follow up question if you don't mind? - In your prototyping did you use azure tables in a single selected region as persisted storage or did you start to also introduce the notion of multiple persistence stores for the distinct regional clusters? (introducing the possibility of dirty local reads for reduced latency and eventual consistency of the distributed storage - just wanting to get a feel for the roadmap and potential in this area)

Really appreciate your efforts in this - its a great framework and I'm excited to dig deeper!

jthelin commented 9 years ago

Yes, both those approaches are being explored.

Too early to give a definitive conclusion just yet.

sergeybykov commented 9 years ago

@sebastianburckhardt is leading the state replication effort.

jameskeongchen commented 9 years ago

Sounds great! looking forward to future updates - will close this out for now. Much appreciated.