Open nono opened 11 months ago
These conflicts shards typically are benign. They result from concurrent updates to _dbs
. When the same database gets created concurrently and they maybe some network or temporary partition delay. After the _dbs
db gets replicated (it replicates in a ring 0 -> 1 ... -> N-1 -> N -> 0
). The cluster will use the winning revision of the database.
To mitigate or stop the issue you can find the conflicted dbs and delete the conflicted revision. So if 2-abc... and 2-def are conflicted and 2-def is the actively used shard map, you can delete 2-abc by getting its rev and then issue a delete with rev = 2-abc
(sorry closed by accident)
Thanks for your response @nickva
Finding the conflicted DB seems like a hard task in a million-db cluster with a medium db creation/deletion rate.
Maybe you have some hints on how to find them?
in a remote shell to any node in the cluster run: custodian:report().
and you'll get a list of database names with their conflicted shard count.
We have used https://docs.couchdb.org/en/stable/replication/conflicts.html#finding-conflicted-documents-with-mango to find the conflicted databases. One interesting things is that we have conflicts for _replicator
and _users
.
To mitigate or stop the issue you can find the conflicted dbs and delete the conflicted revision. So if 2-abc... and 2-def are conflicted and 2-def is the actively used shard map, you can delete 2-abc by getting its rev and then issue a delete with rev = 2-abc
How can we do that?
_dbs
is not a normal database and most requests fail with {"error":"not_found","reason":"Database does not exist."}
. For example, curl -s "$COUCH_URL/_dbs/_replicator?meta=true&open_revs=all"
. Idem for _bulk_docs
.
The "4 conflicted shards in cluster" is referring to conflicts within the meta _dbs documents that define where the shards of databases should be, it is not reporting on the conflicted documents within your regular databases.
the custodian:report().
output will tell you which, and you can then use the /_node/_local/_dbs/dbname
endpoint to examine the conflicts and decide which branches to delete and which to keep.
These are most likely by concurrent requests to create the same database, which is quite unusual.
The "4 conflicted shards in cluster" is referring to conflicts within the meta _dbs documents that define where the shards of databases should be, it is not reporting on the conflicted documents within your regular databases.
Yes, but curl -v -X POST $COUCH_URL/_dbs/_find -d '{"selector": {"_conflicts": { "$exists": true}}, "conflicts": true}' -H "Content-Type: application/json"
has found the 4 conflicted meta documents.
you can then use the /_node/_local/_dbs/dbname endpoint to examine the conflicts and decide which branches to delete and which to keep.
Thanks, by using /_node/_local/_dbs/:dbname
instead of just /_dbs/:dbname
, it works! And same for _bulk_docs
.
Description
We have in the CouchDB logs some messages saying:
We are using many small databases created with a single shard (
q=1
). We don't know which shards are in this state, nor what we can do about that.Steps to Reproduce
We don't know how to reproduce.
Expected Behaviour
Well, if CouchDB could avoid to create conflicted shards, it would be nice. At least, some documentation for what to do in that case is expected.
Your Environment
Additional Context