go-zookeeper / zk

Native ZooKeeper client for Go
BSD 3-Clause "New" or "Revised" License
504 stars 130 forks source link

The continuous logging of "authentication failed: EOF" #117

Open P-h-y opened 8 months ago

P-h-y commented 8 months ago

Encountered an issue: when a TCP connection disruption leads to an EOF error within the io.ReadFull() function in authenticate(), it results in the loop() function indefinitely failing to establish a functional connection.

The cause lies in the cycle where, after encountering EOF, the connect() within the loop attempts to rebuild the TCP connection. Upon successful reconnection, the authenticate() method uses the retained lastZxid to send a request, prompting the ZooKeeper service to actively close this TCP connection, perpetuating the EOF loop.

I suspect the reason behind the ZooKeeper service actively closing the TCP connection is due to internal Zxid loss or expiration within the service. This issue commonly arises during scenarios such as redeployment of the ZooKeeper service container (first docker-compose down, followed by docker-compose up -d) or when the service faces substantial pressure.

It's worth noting that if the client and server operate within the same operating system and the system supports and enables the TCP Keep-Alive mechanism, it might help circumvent the endless EOF loop scenario. However, if TCP Keep-Alive also fails, the aforementioned issue will persist.