apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.26k stars 1.03k forks source link

$in and $or too slow for large database (doesn´t use index) #5189

Open zmara opened 2 months ago

zmara commented 2 months ago

My database has 4.7GB of data (3191269 docs) Attribute _id has implicit index.

{
 "type": "special",
 "def": {
  "fields": [
   {
    "_id": "asc"
   }
  ]
 }
}

This selector works well:

{
   "selector": {
      "_id": {
         "$eq": "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"
      }
   }
}

This selector is pending and never finished (the same when I use $or):

{
   "selector": {
      "_id": {
         "$in": ["CustomerElastic_00000a29-329f-477f-8d4a-8244a9c738aa", "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"]
      }
   }
}

Why?

Steps to Reproduce

Use fauxton and mango query.

Expected Behaviour

It should work the same as $eq

Your Environment

{"couchdb":"Welcome","version":"3.3.2","git_sha":"11a234070","uuid":"ff2fae1b3089d86fd741e84a6e532eda","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

nickva commented 2 months ago

Thanks for your report @zmara. Yeah it's a known deficiency that$or doesn't use an index. It would have to effectively use use the two or indexes and merge them appropriately, and that part isn't implemented currently.

zmara commented 2 months ago

Thanks for explanation. What about that $in operator? I am using it on one field, so what is the issue there?

rnewson commented 2 months ago

I recommend building a map-reduce view directly (queried via _view not _find). This is an actual index (on the emitted key), and any query you can do to _view will be done efficiently (startkey/endkey or key or keys) .

All that _find does is look at all the indexes and tries to find the most efficient one for the selector. If it doesn't find one it just reads the whole database and filters out non-matches.

In your case you could do the much more efficient /dbname/_all_docs?keys=["CustomerElastic_00000a29-329f-477f-8d4a-8244a9c738aa", "CustomerElastic_00001280-7a18-466b-9e9c-da5a2d903bda"] instead.