apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.17k stars 1.03k forks source link

$in and $or too slow for large database (doesn´t use index) #5189

Open zmara opened 3 weeks ago

zmara commented 3 weeks ago

My database has 4.7GB of data (3191269 docs) Attribute _id has implicit index.

{
 "type": "special",
 "def": {
  "fields": [
   {
    "_id": "asc"
   }
  ]
 }
}

This selector works well:

{
   "selector": {
      "_id": {
         "$eq": "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"
      }
   }
}

This selector is pending and never finished (the same when I use $or):

{
   "selector": {
      "_id": {
         "$in": ["CustomerElastic_00000a29-329f-477f-8d4a-8244a9c738aa", "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"]
      }
   }
}

Why?

Steps to Reproduce

Use fauxton and mango query.

Expected Behaviour

It should work the same as $eq

Your Environment

{"couchdb":"Welcome","version":"3.3.2","git_sha":"11a234070","uuid":"ff2fae1b3089d86fd741e84a6e532eda","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

nickva commented 3 weeks ago

Thanks for your report @zmara. Yeah it's a known deficiency that$or doesn't use an index. It would have to effectively use use the two or indexes and merge them appropriately, and that part isn't implemented currently.

zmara commented 3 weeks ago

Thanks for explanation. What about that $in operator? I am using it on one field, so what is the issue there?

rnewson commented 3 weeks ago

I recommend building a map-reduce view directly (queried via _view not _find). This is an actual index (on the emitted key), and any query you can do to _view will be done efficiently (startkey/endkey or key or keys) .

All that _find does is look at all the indexes and tries to find the most efficient one for the selector. If it doesn't find one it just reads the whole database and filters out non-matches.

In your case you could do the much more efficient /dbname/_all_docs?keys=["CustomerElastic_00000a29-329f-477f-8d4a-8244a9c738aa", "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"] instead.