Open question about goint the cluster mode

DavidBM commented 4 years ago

So, the idea is to have a cluster where each node is able have a chunk of the actors. If, for example, there are 100.000 actors and 5 instances in the cluster, then there should be 20.000 actors in each server.

Things I need to think (and where help is very welcomed):

What happens if a server crashes? Should we replicate the enqueued messages in order to not lose messages?
Raft? I'm checking it and it seems good.
Should the consensus algorithm sync messages or just where is each actor?
Until what point should the actor inboxes being persistent?
Is it worth it to have a cluster where if any instance dies are messages lost?

Many decisions!

Cloud33 commented 3 years ago

This project is great. I think we can refer to some settings of Orleans:

What happens if a server crashes? Should we replicate the enqueued messages in order to not lose messages?

Using raft, I believe it can solve this problem, i don't think it's necessary to copy messages. If receive, it means that it can accept message loss or processing failure. If response, the client can receive errors and handle them by itself

Raft? I'm checking it and it seems good.

Yes

Should the consensus algorithm sync messages or just where is each actor?

I think it should be the location of each actor. We can consider using the redis clustering algorithm to calculate the slot according to the actor ID, and the slot distribution of each node of the raft synchronization cluster. In this way, we can control the size of the raft log, which is independent of the number of actors

Until what point should the actor inboxes being persistent?

You mean actor survival time? I think we can provide a fixed cache through the LRU algorithm. The cache stores the actor context and takes the actor ID as the key, which is very simple

Is it worth it to have a cluster where if any instance dies are messages lost?

Yes, I think there is no problem. If you receive, you need to understand the risk of message loss. This is normal, but we need to ensure that if a node leaves the cluster, you need to forcibly shut down the node, because the cluster may be able to transfer the actor of this node to other nodes.

above

DavidBM commented 2 years ago

Hey @Cloud33! Thanks for the feedback and encouraging!

I'm not having free time to work on this, so I don't know how much I can commit to do anything tbh.

Maybe I will be able to take it in the future.

Again, thanks!

DavidBM / acteur-rs

Open question about goint the cluster mode #1