eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
945 stars 396 forks source link

Multi Node Runtimes | Virtualized Runtimes for Cloud Deployments | Component State API #669

Open sirinath opened 7 years ago

sirinath commented 7 years ago

In current cloud environments are distributed by nature but some libraries are not always designed for this in mind. So it would be good if OMR can support the ability to transparently scale software on the cloud which abstracts the environment to give the notion of running in one simple machine.

This should abstract the cluster memory to be viewed as one single block of memory. The cores would be shown as the total number of cores across all machines.

One issue to be addresses is data and code locality. The JIT optimizer perhaps NUMA aware code optimisation and memory relocations.

Also the runtimes state as well as the memory state can be replicated for fault tolerance.

Starting point for this would be for the current components to have a API to manage internal state and and state of the running program. Also a virtualization API.

Once this is done the next can be a component which replicates the state communicating with each component in each machine. The component state should be easily accessible and replicable. Also perhaps a within the machine and over the network state transfer protocol would help. Also a high performance consensus component and a state storage component would be needed.

mstoodle commented 7 years ago

Sounds like it could be an interesting research project to investigate the benefits and identify the pain points using some moderately real workload. I'm not aware of anyone in the current community whose interests lie specifically in this area (feel free to chime in if you're out there :) ) . I could imagine OMR providing primitives and APIs to help a distributed VM move work around on a cluster, but I'm not really convinced that transparently distributing a runtime's components across a cluster will be a very broadly useful design point (and I'm pretty sure it will, at the very least, complicate the much more common single machine VM design point... I don't really have much zeal to increase OMR's complexity, to be honest).

Would you be interested in adjusting your concept to the problem of connecting and coordinating a set of VMs across a cluster?

sirinath commented 7 years ago

If I am to break it down:

Would you be interested in adjusting your concept to the problem of connecting and coordinating a set of VMs across a cluster?

This is fine.

Basically give something simple to the user of the project to build cloud runtimes which can scale but adding nodes, resilient to failure of nodes and near zero different whether it is single machine of clustered.

End user of the language who would use it as a VM or compiled language will be able to code as if it was a single machine. For advanced usage you might be able to expose the clustering API but general use it need not be used.