chetmurthy / ensemble

The Ensemble distributed communications toolkit
13 stars 2 forks source link

Plans to continue the development? #1

Open LdBeth opened 3 years ago

LdBeth commented 3 years ago

I start try to work on bring back the distributed refiner for metaprl (it is annoying a single OCaml program can not run on multiple processors). The latest version here seems has the building dependency broken and I don't have enough experience with make to fix them in place so I started reconstruct the directory organization with OMake https://github.com/LdBeth/ensemble . Don't know if these provides helps to you, but I made some attempts to fix the byte string conversion issue and cleaned up some hacks (for example sets of operators like =|| >|| >=|| are defined to specialize operation on int type) and aims to have a minimal version that can be used by metaprl.

chetmurthy commented 3 years ago

I don't know anything about MetaPrl, nor about its distributed refiner. Last time I worked with Nuprl was in 1990, so .....

I'm glad you're trying to use Ensemble! It's a lovely piece of code, and the reason I found all the old versions and put them online, was so that others could use them.

Are you saying that it doesn't build? That's a little surprising -- I know I built it back in 2013.

LdBeth commented 3 years ago

as what I got with OCaml 4.11, when it reach the ML interface files under server/type, make says don't know how to build these file. What I did is porting some of the building process OMake.

I get the mtalk demo successfully built, however when running a second mtalk, it gives IPMC:error:binding to port 6793:Unix(Address already in use,bind), possible due to the same port has been bind by the first program. it this how mtalk supposed to work? I do not know much about the network programming, but I guess it should like the unix talk program. Is this possibly related to

For OS X, the OCaml compiler must be slightly modified in order to support IP multicast, and two files in the Ensemble distribution must be changed in order to set a socket option. If you need to make these changes, contact Alon for instructions.

since I do run it on OS X.

chetmurthy commented 3 years ago

I don't know anything about the OS X issue, but your error-message sounds like the typical "two processes binding to the same port" problem. Ensemble had two ways for processes to intially learn of each others' existence: (1) explicit listing of address/ports (I forget exactly how this worked) and (2) running "gossip" servers.

I'm still curious why you need Ensemble? If MetaPrl uses Ensemble for RPC-based distributed proof-checking, might it not be easier to just use some other existing RPC system (like Thrift)?

LdBeth commented 3 years ago

Ok, seem I figure that out, after fall back to the usocket library the gossip server and mtalk do work as expected, at least gossip do aware of these clients.

Metaprl's distributed refiner features arbitrary process can join/leave during refining, I don't think this is possible with any other RPC system? do they?

chetmurthy commented 3 years ago

Tyically RPC systems don't support this. But you could always write a simple "directory server" using Thrift (that would keep track of the currently-alive set of processes, and their addresses) and a client that wanted to distribute some proof-obligations, could look up the current set of servers ....

Sure, it's not quite as nice as Ensemble, but then, Ensemble is a -lot- of code.

LdBeth commented 3 years ago

That's ok, I think with just core Ensemble it is the scale that I can handle.

From what currently I know about metaprl's refiner:

  1. it use mmap to communicate
  2. it uses an old version of Ensemble, probably older than 0.7 available in this repo, some efforts are needed to port the code, but I can see the idea is still very same.
  3. the clients needs to acquire lock to shared global resources, i think it is very hard to be done right if uses a simple rpc system