apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.26k stars 1.03k forks source link

Implement quorum limit for search endpoints #4026

Open garethbowen opened 2 years ago

garethbowen commented 2 years ago

Summary

Many endpoints respond only when a quorum of replicas have responded with what the actual response should be, however not all endpoints behave this way. As @rnewson said in Slack:

the _view, _find and _search endpoints read one copy of each shard range to form the response, there is no option to consult multiple copies, the code simply hasn't been written.

This can cause bugs in applications that make successive queries where these may be ultimately handled by different nodes.

Desired Behaviour

All endpoints to behave consistently using the default quorum which can be overridden (eg: with r and w query parameters). This makes the behaviour of endpoints consistent and therefore easier to develop against, but also means subsequent queries can be guaranteed to work given appropriate values for the quorum.

Possible Solution

Have these endpoints read all copies of the shard and respond when a quorum is achieved.

Additional context

We're upgrading from supporting single node to clustered operation and certain tests in our integration suite are failing too often.

Some workarounds we have considered include, using w=<replica count> everywhere, setting replicas to 1, and now investigating using sticky sessions so subsequent requests from the same client are handled by the same node. These have obvious drawbacks and none have been found (yet) to be 100% reliable.

rnewson commented 2 years ago

This is non-trivial. The N copies of the database are not in the same order, and the indexes are in the order of the corresponding data shard. It is not just a case of reading multiple indexes to form a quorum.

The only plausible fix is to ensure that all the N replicas of a database are in the same order, which involves adding strong consensus.

rnewson commented 2 years ago

as for "Many endpoints respond only when a quorum of replicas have responded" I think it's actually a minority. Specifically only those that read or write documents. Even that quorum is not guaranteed, couchdb will lower the quorum threshold all the way down to one if it believes the other nodes are unavailable (e.g, during a network partition that isolated a single node, that node would continue to serve read and write requests for the documents it hosts).