[X] I had searched in the DSIP and found no similar DSIP.
Motivation
When the master/worker disconnect from registry, then it might reconnect latter.
e.g. We use curator to connect to zk, if the session timeout is 120s, the server will go into suspend if the heartbeat is failure in 80s, and then it will reconnect to another zk node, if reconnect success, then the server continue work. But sometimes, other server might receive a disconnect event of the reconnect server in this case.
We need to make sure if someone has failover a node, then the node must go died.
Design Detail
We import a FAILOVER_FINISH_NODES in registry, each server use address+server startup time as it's identify, once a server has been failovered, then it will be put under FAILOVER_FINISH_NODES, so if someone find it is under FAILOVER_FINISH_NODES then it should go died.
Search before asking
Motivation
When the master/worker disconnect from registry, then it might reconnect latter. e.g. We use curator to connect to zk, if the session timeout is 120s, the server will go into suspend if the heartbeat is failure in 80s, and then it will reconnect to another zk node, if reconnect success, then the server continue work. But sometimes, other server might receive a disconnect event of the reconnect server in this case.
We need to make sure if someone has failover a node, then the node must go died.
Design Detail
We import a FAILOVER_FINISH_NODES in registry, each server use address+server startup time as it's identify, once a server has been failovered, then it will be put under
FAILOVER_FINISH_NODES
, so if someone find it is under FAILOVER_FINISH_NODES then it should go died.Compatibility, Deprecation, and Migration Plan
No response
Test Plan
No response
Code of Conduct