brianfrankcooper / YCSB

Yahoo! Cloud Serving Benchmark
Apache License 2.0
4.94k stars 2.24k forks source link

Errors accessing MongoDB sharded cluster #977

Open tdeneau opened 7 years ago

tdeneau commented 7 years ago

I am trying to use ycsb to a MongoDb database sharded across 4 servers (not replicated). At the beginning of the run I see errors such as the following: 22:39:10.986 [Thread-3] INFO c.a.m.client.state.ClusterPinger - Could not ping 'sso-sp08-ff:27017': Connection refused (Connection refused) How do I get rid of these "could not ping" errors?

Also, when the load gets heavier, I see errors such as: com.allanbank.mongodb.error.QueryFailedException: Operation aborted.

what does that indicate?

Note: I have never had such problems accessing a non-sharded single MongoDB instance from YCSB,

-- Tom Deneau

allanbank commented 7 years ago

At the beginning of the run I see errors such as the following: 22:39:10.986 [Thread-3] INFO c.a.m.client.state.ClusterPinger - Could not ping 'sso-sp08-ff:27017': Connection refused (Connection refused)

Is sso-sp08-ff:27017 a valid mongos instance? Was there ever a mongos on that host port?

By default the driver will try and discover all of the Mongos nodes that are available. It does that for a sharded cluster by looking in the config databases 'mongos' collection. If the mongos is no longer running then the log is expected and is benign. To get rid of the log you can remove the document from the collection but make sure you do it via a mongos and not directly on one of the configuration servers. Also make sure you only delete documents for mongos instances that are not running.

Also, when the load gets heavier, I see errors such as: com.allanbank.mongodb.error.QueryFailedException: Operation aborted.

That error is coming back from the mongos process. Is there anything interesting in the mongos or mongod logs?

tdeneau commented 7 years ago

Regarding the ClusterPinger errors (non-fatal, I agree), yes I see the entries in the config.mongos database, one for each mongos instance I have started. But what should respond to the ping? Is it mongos itself? If so, why would it be refusing the connection?

Everything in my test cluster is running on one system whose hostname is sso-sp08-ff. All the mongos config files use 127.0.0.1 as the connection address, but the entries in config.mongos all have the hostname, rather than 127.0.0.1. Is that a problem?

Regarding the QueryFailedException, I now see this specific error reported: com.allanbank.mongodb.error.QueryFailedException: End of file

and I see things like this logged in the mongos logs:

2017-06-07T17:18:32.130-0500 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-40-0] Connecting to 127.0.0.1:27018 2017-06-07T17:18:32.133-0500 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-40-0] Failed to connect to 127.0.0.1:27018 - HostUnreachable: End of file 2017-06-07T17:18:32.133-0500 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-40-0] Failed to close stream: Transport endpoint is not connected

not sure what file is being referred to here. Is there a way to configure the size of the TaskExecutorPools, etc? in the mongos?

-- Tom

allanbank commented 7 years ago

Everything in my test cluster is running on one system whose hostname is sso-sp08-ff. All the mongos config files use 127.0.0.1 as the connection address, but the entries in config.mongos all have the hostname, rather than 127.0.0.1. Is that a problem?

Not a problem but will cause the behaviour you are seeing. The driver "discovered' the host name and is trying to connect to that address in addition to the 127.0.0.1 address. It will still use the 127.0.0.1 address provided.

not sure what file is being referred to here. Is there a way to configure the size of the TaskExecutorPools, etc? in the mongos?

Anything in the logs of the mongod running on port 27018? This looks like the connection to the mongod is getting closed before the mongos can even start using it.

The is a MongoDB ticket (SERVER-21752) that looks like it may be related. What version of MongoDB are you using? SERVER-21752 is fixed in 3.2.1 and 3.3.0.

tdeneau commented 7 years ago

I am running mongodb 3.2.12 so SERVER-21752 should not be a problem.

When I look at the mongos logs I don't see anything that really helps me. Here is a typical entry at log level 2...

2017-06-08T15:01:06.514-0500 D ASIO [NetworkInterfaceASIO-TaskExecutorPool-61-0] Failed to execute command: RemoteCommand 1609494 -- target:127.0.0.1:27018 db:admin cmd:{ isMaster: 1 } reason: HostUnreachable: End of file

allanbank commented 7 years ago

Is the connection getting reported in the mongod server's log?

The isMaster command is the first message any app makes when connecting. The fact that there is not getting a response makes me think there is something other than a mongod listening on port 27018 or the mongod is expecting an SSL connection and the mongos is not using SSL.

tdeneau commented 7 years ago

The fact that everything works fine under a lighter load makes me think there is no problem like SSL expected and not supplied.