Tokutek / mongo

TokuMX is a high-performance, concurrent, compressing, drop-in replacement engine for MongoDB | Issue tracker: https://tokutek.atlassian.net/browse/MX/ |
http://www.tokutek.com/products/tokumx-for-mongodb/
703 stars 97 forks source link

Keep cluster available if shard is unavailable #1120

Open leifwalsh opened 10 years ago

leifwalsh commented 10 years ago

Some discussion here: https://groups.google.com/forum/#!topic/tokumx-user/_D6LS8tZzpk

A mongos will throw errors back at a client for pretty much any operation, if any shard's primary is not reachable. For example

mongos> sh.status()
Sun Apr 27 15:06:08.798 error: { "$err" : "ReplicaSetMonitor no master found for set: rs0", "code" : 10009 } at src/mongo/shell/query.js:128
mongos> show dbs
Sun Apr 27 15:06:19.419 listDatabases failed:{
"code" : 10009,
"ok" : 0,
"errmsg" : "exception: ReplicaSetMonitor no master found for set: rs0"
} at src/mongo/shell/mongo.js:46

This can prevent users from diagnosing other problems and from attempting maintenance operations.

Current plan is to do a few separate things, plan may change as we investigate:

  1. Make the mongoses capable of still doing administrative things when a shard doesn't have an accessible primary.
  2. Make other shards still available even if one shard is down, possibly allow commands to return partial results.
  3. Make the shard in question still accessible to slaveOk queries.
ankurcha commented 10 years ago

Wanted to share some more info on this:

I have found known issues that are likely related to what you are seeing.

First, Confirm that you are using a read preference that allows for secondary read - like primaryPreferred, secondaryPreferred or nearest? The default read preference is primary which by definition will only allow for read from the primary member. This is by design as some systems require reading of consistent data.

In regards to the find() and sh.status() failures, there is a known issue reported by SERVER-7246 where secondary reads fail on new connections when a shard has no primary. This affects MongoDB 2.4.8 and earlier and mongos in 2.4.9+ requires a command line flag to fix which is:

--setParameter ignoreInitialVersionFailure=true

If running with authentication then the following command line flag is required as well:

--setParameter authOnPrimaryOnly=false

No command line flags are needed for MongoDB 2.6.x The show dbs failure is still an open issue, tracked by SERVER-13768.

If you would like to test using the mongo shell, you can set read preference as follows:

mongos> db.getMongo().setReadPref('secondaryPreferred')
pdonon commented 10 years ago

I'm using MongoDB 2.6.3, when my three ReplicaSets have no primary. I still get this issue: { "$err" : "ReplicaSetMonitor no master found for set: ShardA", "code" : 10009 }