Open GoogleCodeExporter opened 9 years ago
I was thinking about this, and there fundamentally are two ways to implement it:
1) Provide a ValueComparator to be used *instead* of the key comparator for
range selection: by doing so, we'd need to broadcast the range operation to
*all* nodes, rather than a subset of nodes as we do now, because no node have
knowledge of all data in the cluster.
2) Provide a ValueComparator to be used *after* the key comparator: that is,
the new comparator would sort only those documents retrieved by the key-based
range query, and as a consequence, the comparator would only affect sorting,
not range querying.
That said, I think #1 would cause too much network traffic and probably
unnecessary CPU work by all those nodes queried without actually having any
document of interest. Indeed, #1 should be properly implemented with
distributed indexes, but that's a completely different issue and right now we
have ElasticSearch integration for that.
So I think #2 should be the way to go: would that work for you? Do you need any
clarifications?
Other thoughts from the committers crew?
Original comment by sergio.b...@gmail.com
on 21 Feb 2011 at 6:20
My use case is to sort a list of documents by an essentially arbitrary
attribute and then limit the results by an offset and count (user interface
sorting and "paging"). If I understand your proposal correctly, I suspect that
option #2 does not address my need because I would first need to constrain the
space of documents to a key range.
That said, I'm interested in understanding option #2 better. Can you explain
it with a potential use case?
Original comment by teonanac...@gmail.com
on 21 Feb 2011 at 6:31
> I suspect that option #2 does not address my need because I would first need
to constrain the space of documents to a key range.
Correct.
> That said, I'm interested in understanding option #2 better. Can you explain
it with a potential use case?
I was thinking your use case were to query for documents with a given id/key
range (and optionally a predicate) and then sort by a given attribute.
In other words, the fundamental difference between the two approaches is what
you query for: if you can query for the id, filter by predicate, and then do
the sorting, option #2 fits it; otherwise, if you cannot constrain your key
range, #2 doesn't fit.
In my experience, many use cases can actually put boundaries to key ranges by
just arranging the algorithm as follows:
1) Query from startKey=1 with limit=10 and predicate=X.
2) Optionally sort (how doesn't really matter here).
3) Take the last returned key and repeat by using it as new startKey.
The trick is to advance the start key and use the predicate to select the
attribute you're looking for (and possibly the one you want to sort).
Does that help?
Original comment by sergio.b...@gmail.com
on 21 Feb 2011 at 6:45
I'd just like to share the notion that this sounds like a completely different
operation than range queries, which I assume that this is a proposed extension
of?
That holds true especially in the absence of a key comparator in a query.
As you already hinted at, Sergio, this is more of a search operation and for it
to make sense it would have to use an existing index.
Original comment by johansso...@gmail.com
on 21 Feb 2011 at 9:00
> I'd just like to share the notion that this sounds like a completely
different operation than range queries
It depends on how the user actually sees it, that's why I asked: is it a search
by attribute, or a filtered range query sorted on the attribute?
Surely, a distributed index would probably be the best solution to the problem,
but that's a whole different story :)
Original comment by sergio.b...@gmail.com
on 21 Feb 2011 at 9:08
Original issue reported on code.google.com by
teonanac...@gmail.com
on 21 Feb 2011 at 1:25