Open doits opened 9 years ago
With this I've noted that exiting programs which used DCell
hang really long after displaying
DEBUG -- : Terminating 89 actors...
I flushed redis db manually and it came back to normal, but shouldn't stale nodes be cleared automatically?
Zeromq is "stateless" when it comes to connections, you can still send messages to a peer which is disconencted and it will automatically send those messages again when it comes back online.
But if needed one could implement a ping/pong mechanism for DCell which would disconnect inactive nodes.
At least it should not hang (on termination or sending messages to nodes) when a lot of stale nodes are present.
one would have to set the sndtime to 0 for each zmq socket on shutdown so it discards all remaining messages.
yeah, that's a good idea - if there are remaining messages on shutdown output a warning and discard them after for example waiting 10 seconds (user configurable).
Also a configurable timeout when a node hangs would be great, for example when I try DCell::Node['which_is_dead].all
, it hangs really long - it should throw an exception after a user configurable time (or if it does it already after too long time, the time should be configurable :-))
@doits it's already like this in master. Dead nodes are not taken into account(though they are still present in the DB).
At one point nodes healthchecked other nodes and marked them down if they didn't get responses. Did that get lost along the way?
@tarcieri @doits in current master there are currently 3 ways to bypass dead nodes:
If you are accessing actor by id(w/o specifying the node) you get all actors with request ID from all alive nodes: scratchy example
I switched to master now and things go much smoother now. Didn't have enough time to test it, though, so maybe tomorrow I can say more. Thanks for the explanation!
I've played around with
DCell
a little bit, but now I have this:I've only two nodes running just now, but it still lists
75
of them. Also, it lists multiple nodes with the same address (which cannot be, right?). Is there any way to clear stale/dead/removed nodes?