I found some problems that can cause panics on failure of ZK servers.
The point is that zk.Connect() must not be called again to reconnect another ZK server on failure of the leader of the ZK cluster. ZK.conn internally and automatically reconnect to another ZK server. Therefore, basically users of ZK.conn have nothing todo to support fail over.
I actually confirmed with the current implementation that zk_coordinator sometimes creates many goroutines (>100) and sockets and panics when I shutdown the leader node of my ZK cluster.
This patch makes zk_coordinate simply ignore connectionEvents.
Only when the ZK session is expired, Reinitialize will be sent to Consumers so that they can recreate ephemeral nodes. Watches are recreated in trySubscribeForChanges(), therefore I removed SubscribeForChanges() in listenConnectionEvents().
Hi,
I found some problems that can cause panics on failure of ZK servers.
The point is that zk.Connect() must not be called again to reconnect another ZK server on failure of the leader of the ZK cluster. ZK.conn internally and automatically reconnect to another ZK server. Therefore, basically users of ZK.conn have nothing todo to support fail over.
I actually confirmed with the current implementation that zk_coordinator sometimes creates many goroutines (>100) and sockets and panics when I shutdown the leader node of my ZK cluster.
This patch makes zk_coordinate simply ignore connectionEvents. Only when the ZK session is expired, Reinitialize will be sent to Consumers so that they can recreate ephemeral nodes. Watches are recreated in trySubscribeForChanges(), therefore I removed SubscribeForChanges() in listenConnectionEvents().