Closed HoneyryderChuck closed 9 years ago
The unregistering issue is a bug/absent feature of the Redis driver. It should set a TTL on the entries it puts in there so they auto-expire unless refreshed, and get refreshed at a periodic interval. But that's a lot of work for a driver that's not intended for production use. Feel free to implement it, but I'd suggest just using Zookeeper.
In general, if a node goes down, RPCs will block. DCell implements a timeout mechanism to eventually time them out when this happens, but otherwise it's expected behavior.
Ok, I'll go with the suggestion. So Zookeeper does the deregistering out-of-the-box, right? I don't need to add a new setting, do I?
About the timeout mechanism, is that an existing feature? I don't see it documented anywhere. When accessing a node which is unacessible, it just blocks indefinitely, no timeout triggered.
It was implemented at one point. Now I'm having trouble finding the code :cry:
https://github.com/celluloid/dcell/blob/master/lib/dcell/node.rb#L73-L75
If it was, it isn't anymore, it appears. Here the node is sending a message and blocking on the receive. I think the receive API supports a timeout parameter, right? I think this could be implemented either with a default timeout or parameter on DCell.start. What do you think?
Timeouts are probably a good start
It's a bit reworked in pull request #88, I believe it will solve this
In current master there is a possibility to ping(and get latency accordingly) remote node before starting any communication(that should be done explicitly). Also when a local instance of a node(bound to remote one) is created, it checks current value of TTL. If remote node didn't update it within a grace period then exception is raised and creation is aborted. If the remote nodes goes down and doesn't answer within the grace period then the client will raise an exception.
I'm using DCell with redis registry. I've started these two scripts where both nodes were registered the first time to redis. Through it, they are sharing an access address and they communicate.
Problem is, if I stop one script, everytime the running script tries to access it, it blocks.
It seems that this happens because the stopped script has registered itself in redis the first time it ran, but it never unregistered itself when it stopped. Hence, the second script still gets the reference to its address.
So the question is, is there a way to unregister itself from the cluster on process stopped? Some way to prevent the block from happening?