apache / couchdb-nano

Nano: The official Apache CouchDB library for Node.js
https://www.npmjs.com/package/nano
Apache License 2.0
649 stars 166 forks source link

[RFC] Implement Partitions api #126

Open garrensmith opened 5 years ago

garrensmith commented 5 years ago

We are currently adding partition support to CouchDB https://github.com/apache/couchdb/pull/1605 This document details how I think the api for partitions should work for nano.

Partitions allow a user to store related documents into a partition within CouchDB. Using the new partition endpoints this would then allow a user to query only documents in a specific partition. This leads to much faster query time as CouchDB only needs to fetch the documents from a subset of the shards for a db.

To store a document in a partition a user prefixes an id with the partition name e.g {_id: "partition1:my-doc", "field": "one"} and {_id: "partition2:my-doc", "field": "one"}.

Then using a tradional view to query the document you would use these endpoints: Map/Reduce: /my-db/_partition/partition1/_design/my-view

And for Mango: /my-db/_partition/partition1/_find

The idea around partitions, which I've hopefully conveyed really quickly above, is that data in a partition is quite separate and when a database is partitioned a user would work with each partition separately. I would like to reflect that kind of thinking in the api. So I propose that we would add a new function called partition which accepts a partition name and returns an object for you to query a specific partition. Hopefully the below example explains it.

await nano.db.create('db1', {partition: true});
const db = nano.use('db1');
await db.insert({
      views: {
        aview: {
          map: "function(doc) {\n  if (doc.group) {\n    emit([doc.some, doc.group], 1);\n }\n}",
          reduce: "_count"
        }
      }
    }
}, '_design/example-query');

db.insert({some: "field"}, 'partition1:doc1');
db.insert({some: "field2"}, 'partition1:doc2');
db.
// This goes in partition 2
db.insert({some: "field2"}, 'partition2:doc1');

const partition1 = db.partition('partition1');

// This will only return doc with id `partition1:doc1`
const docs = await partition1.find({
   selector: {
     some: "field"
   }
});

const docs2 = await partition1.view("example-query", "aview", {include_docs: true});

const partition2 = db.partition('partition2');
// A view can be used for each partition
const docsFromPartition2 = await partition1.view("example-query", "aview", {include_docs: true});

The new partition object would support all the .find and .view options to query with and internally would remember the name of the partition to use when querying.

Currently we don't support _all_docs or changes.

glynnbird commented 5 years ago

If a user has a "partition object" e.g.

const partition1 = db.partition('partition1');

then it might make sense for them to be able to do all CRUD operations:

This mechanism allows the partition to be expanded in future to support _all_docs and _changes endpoints if they were to be implemented on the partition level.

glynnbird commented 5 years ago

It's also worth noting that the Nano library includes the search endpoint which models the Cloudant-specific Lucene search API. It might be worth allowing partition1.search(...) too.

garrensmith commented 5 years ago

@glynnbird good point. I think we should add search and I like the idea of using insert, get, destroy and bulk. I'm guessing when we do that they would not supply the partition we would automatically insert it in?

glynnbird commented 5 years ago

I think so. If someone does partition1.insert({ _id: 'bob', x: 45 }), the document _id would be manipulated to add the partition prefix. Same story for the other operations. The "partition" object in Nano "knows" the partition you are working with so it knows what prefix to add to each document id.