allanbank / mongodb-async-driver

The MongoDB Asynchronous Java Driver.
Apache License 2.0
22 stars 14 forks source link

Driver failes sometimes to reconnect #15

Closed notz closed 9 years ago

notz commented 9 years ago

After a restart of the primary server / mongos of our shareded setup about 10% of the client app failes to reconnenct.

The exception raised:

com.allanbank.mongodb.error.MongoDbAuthenticationException: com.allanbank.mongodb.error.ReplyException: exception: DBClientBase::findN: transport error: tokumx:27118 ns: local.$cmd query: { getnonce: 1 } #011at com.allanbank.mongodb.client.connection.auth.AuthenticatingConnection.ensureAuthenticated(AuthenticatingConnection.java:203) #011at com.allanbank.mongodb.client.connection.auth.AuthenticatingConnection.send(AuthenticatingConnection.java:133) #011at com.allanbank.mongodb.client.connection.proxy.AbstractProxyMultipleConnection.doSend(AbstractProxyMultipleConnection.java:450) #011at com.allanbank.mongodb.client.connection.proxy.AbstractProxyMultipleConnection.trySend(AbstractProxyMultipleConnection.java:625) #011at com.allanbank.mongodb.client.connection.proxy.AbstractProxyMultipleConnection.send(AbstractProxyMultipleConnection.java:287)

The driver never is able to reconnect, the application have to be restarted.

Driver version is 2.0.1

allanbank commented 9 years ago

@notz - Is the server 'tokumx:27118' the mongos or one of the mongod / configuration servers?

The getnonce command is the first one when authenticating to the server and the DBClientBase::findN: transport error is from the MongoDB server not the client which makes me think there is a problem on that side of the connection.

I am going to check if the authentication failure will close the connection. Closing the connection might help work around the issue if it is transient.

Rob.

allanbank commented 9 years ago

@notz - Please try this patched version of the driver:

http://www.allanbank.com/mongodb-async-driver-2.0.2-SNAPSHOT.jar

It has a one-line change to close the connection when authentication fails.

The driver had been leaving the connection open because in theory a user could have successfully authenticated via a different set of credentials. The real fix is going to be having the authenticators report when the authentication failed due to wrong credentials (which is not likely to be fixed via a retry) vs. some other error that implies we should retry the authentication.

notz commented 9 years ago

@allanbank 'tokumx:27118' is the mongod server.

I will try it, but don't expect it too soon. Because i need to try it on live system.

allanbank commented 9 years ago

@notz - I just updated the snapshot jar with a new version that does not close the connection on an authentication failure and instead periodically retries failed authentication attempts.

Thanks, Rob.