apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.26k stars 1.03k forks source link

_bulk_get with latest=true option is slow #3063

Closed craftzdog closed 4 years ago

craftzdog commented 4 years ago

Hello. I'm using CouchDB 3.1.0 on Ubuntu and building an app that replicates user data using PouchDB. The replication protocol uses _bulk_get API with latest=true. I found that this API responds 3x slower than without that option specified when I call it with 100 doc IDs on my server. I understand PouchDB needs this option to replicate data in order to make sure that docs are always latest. If I dropped this option, PouchDB seems to randomly crash because it gets an error response from CouchDB when the doc or its revision is not found. Would it be possible to improve the latest behavior?

Summary

Improve latest option performance

Possible Solution

I found that it checks revision tree when latest is specified here: https://github.com/apache/couchdb/blob/3fc054d86f0844bdf851e402b05df5db08b1c230/src/fabric/src/fabric_doc_open_revs.erl#L94 But I don't know if there is room to improve it.

Thanks in advance!

craftzdog commented 4 years ago

The bottleneck was my network on EC2. My cluster nodes were deployed across different regions in order to make them disaster tolerant. That caused the network bottleneck between nodes where the RTT was around 166ms. I moved the nodes to the single region but in different availability zones. Now it works very fast!

Sorry for bothering you.