Open Thomasdezeeuw opened 5 years ago
Needed:
I want to be able to link multiple actors together over a network, where the (unix) processes aren't running the same binary. So different versions of the users' code and of Heph (post v1) should be able to communicate. For this a protocol needs to written, including a way to share definitions of the messages and how to determine whether or not both sides of the communication see them as the same, where it should be possible to extend the data structure in newer versions of the code, e.g. adding fields. I intend to support and encourage rolling updates of the users' code (including new versions of Heph).
We'll also likely need to some kind of crash detection, even though we provide no guarantees for message delivery.
As for the protocol I think TCP might have too much overhead and UDP should be preferred, as again the message delivery is not guaranteed. Alternative we can let the user decide on this, providing both TCP and UDP based connections and maybe even something based on QUIC.
Consider using compression when sending messages: zstd with level 3, seems appropriate.
Messages need some kind of id so that we reply to messages to support ActorRef::rpc
.
Commit 05d0ff1f8fff356535077788573ffbc43b864745 removed the old, unused RemoteRegistry
. Replace it with a proper version (currently in the net-relay
branch).
Related #307.
Idea for determining the topology:
Create a UUID for each runtime and for each actor spawned (also see ActorURL). Using the UUID for the runtime we can create a topology of all connected peers.
type Peer = (SocketAddr, UUID);
This way even if peers are connected using different IP/ports we can still match them using the UUID. E.g. peer A knows peer B at (123, ABC) and peer C knows a peer D (456, ABC) we can see that the UUID match up and see that B and D are the same peer.
To create a topology we can then send pings (messages) to all peers known to a certain runtime. From that we can create the following connections.
struct Connection {
src: Peer,
dst: Peer,
latency: Duration,
}
And collecting all of these will create a topology for us.
For encryption we could maybe use something like https://github.com/RustCrypto/traits, e.g. https://docs.rs/aead.
Current idea is to make a quick (and bad) implementation of remote message passing based on UDP and JSON (not encrypted). I know this is bad, but it gives me a starting point to get an idea of how the API should look.
For this the following would be needed:
Port to listen for incoming messages on each node
To start lets make this a fixed port. Later this can be dynamic, maybe per process.
A process that handles incoming messages
A shared process (to ensure progress) that handles these incoming messages. This would likely be an actor with which other actors can register themselves using there
ActorRef
. This actor would then receive a message, attempts to de-serialise it for the correct registered actor and send the message using theActorRef
.This process/actor can run on the main thread, but that is not a great solution. What would be better is a work stealing scheduler (#254).
Mechanism to deliver messages to the correct actor.
Mechanism to serialise/de-serialise messages
Need a way to describe the format of a message to create bytes from it and reverse that operation. Serde can handle this for us, would make it a public dependency.
Before sending a message we can agree with the peer on a data definition, i.e. what to wire format to use for the message. To start we'll use JSON as that is self-describing.
Mechanism to create an
ActorRef
to a remote actor.Expose
ProcessId
/WakerData
? -- Don't like this as it not user facing and I don't care to expose it.New concept;
ActorUrl
? Could look something like "$node/$actor", e.g. "node_01/value_store".Cons:
ActorRef
to actors on a remote machine so they can communicate. But that won't be possible with actors that are not explicitly registered first.Do we want version? E.g. "node_01/value_store/v1". This would require user defined versioning.
We would send something like:
Where
data
is the actual message send by the actor.Paper about general, fast rpc: https://blog.acolyer.org/2019/03/18/datacenter-rpcs-can-be-general-and-fast/, could be of interest.