couchbaselabs / beersample-node

Sample web application using node.js, couchnode and libcouchbase to access couchbase server.
8 stars 9 forks source link

Can't find documentation on key functionality #5

Closed neatcode closed 9 years ago

neatcode commented 9 years ago

The use of "\u0FFF" on line 210 of beer_app.js doesn't appear to be explained or documented anywhere in the API reference for Couchbase Node.js 2.0 SDK. (Located here: http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.0/)

It is mentioned in the documentation for beersample-node, like so:

The code passes the user's input for the startkey and for the endkey passes the user's input appended with a Unicode \u0FFF value, which for the view engine means “end here.” You need to get used to it a bit, but it’s actually very neat and efficient.

That's hardly an explanation. Where do I go for further documentation on the "view engine"? In the meantime, could someone offer a better explanation of what's going on here?

snej commented 9 years ago

It's pretty simple. CouchDB-style map/reduce doesn't have any direct support for prefix matching, just key ranges.

So if you wanted to get all keys that start with "foo", you might naively set a key range from "foo" to "fooz", since that will return "foo", "foobar", "food", "foot" and even "fooz". Unfortunately it won't return "foozz". You could add a couple more "z"s to the endKey but that obviously has the same problem with keys like "foozzzzzzz…".

Worse, "z" sorts before a lot of non-ASCII non-alphabetic Unicode characters. So no matter how many "z"s you add, you still won't match "foo™".

There isn't a perfect solution to this, but the real-world answer is to append some Unicode character with a very large numeric value (codepoint). "\u0FFF" seems too low to me; I generally use "\uFFFE", but that's in Objective-C code not JavaScript. One of the annoyances is that using high Unicode characters means that you're subject to quirks of how your chosen programming language's string library handles Unicode. It's gotten even worse lately because Unicode is actually 32-bit but most implementations store it in 16-bit (as UTF-16), which means Unicode codepoints above 0xFFFF (such as the suddenly-popular emoji!) get encoded into two characters...

Anyway I hope this helps. In the future it's probably best to ask questions on Couchbase forums; we tend to treat the Github issue trackers as bug repositories not Q&A.

neatcode commented 9 years ago

Thank you for the thoughtful explanation. I wasn't aware of Couchbase forums until recently, so I'll try to make use of them for these type of questions in the future.