elodina / go_kafka_client

Apache Kafka Client Library for Go
http://www.elodina.net
Apache License 2.0
275 stars 74 forks source link

Do not call Connect() again to support ZK fail over #191

Closed yudai closed 8 years ago

yudai commented 8 years ago

Hi,

I found some problems that can cause panics on failure of ZK servers.

The point is that zk.Connect() must not be called again to reconnect another ZK server on failure of the leader of the ZK cluster. ZK.conn internally and automatically reconnect to another ZK server. Therefore, basically users of ZK.conn have nothing todo to support fail over.

I actually confirmed with the current implementation that zk_coordinator sometimes creates many goroutines (>100) and sockets and panics when I shutdown the leader node of my ZK cluster.

This patch makes zk_coordinate simply ignore connectionEvents. Only when the ZK session is expired, Reinitialize will be sent to Consumers so that they can recreate ephemeral nodes. Watches are recreated in trySubscribeForChanges(), therefore I removed SubscribeForChanges() in listenConnectionEvents().

serejja commented 8 years ago

Thanks!