Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 170 forks source link

Can one sort using a given custom comparator / predicate? #266

Closed victorbadila closed 7 years ago

victorbadila commented 7 years ago

Not sure if this is supported, I haven't found it on the Suppose I have a collection indexed by a field of type map<text, text>. I use this map to store all sorts of values (strings, numbers..) and the values are of type string because this is how I "encode" them.

If I use sorting using a field of that map, like so:

SELECT ... FROM ... WHERE ... AND lucene_index_field = '
{
  sort: {
    fields:[
      {
        field:"mymap$some-field",
        reverse: true | false
           }
    ]
  },
  refresh:true
}';

then it will return the results sorted by the some-field property of the mymap field of the collection. The sorting will be done with some-field treated as a string, since the map values are strings.

Would it be possible to have the sorting done with some-field being treated as a number? - in which case 43 will be smaller than 123, a thing which does not stand for stirngs. Or even better could we provide a custom comparator or predicate to be used for comparisons, like casting to number or whatever one may have in mind?

EDIT:

In case anyone wants to do anything similar, I finally resolved to encoding numbers as left/right padded strings in cassandra, with a specific precision in both directions of the floating point. Thus if the precision is 4: 43 -> "0043.0000" 123 -> "0123.0000" and "0123.0000" > "0043.0000" just like 123 > 43.

ealonsodb commented 7 years ago

Hi @victorbadila,

The order depends on the type of the mapping, because sorting relies on how the data is indexed. The indexed values are already partially sorted on disk, so introducing a custom comparator could have a negative impact in performance.

There are several alternatives:

Hope this helps.

victorbadila commented 7 years ago

@ealonsodb thanks for answering. I don't think those solutions would work for my case (maybe the first one would) since I am using that map to store dynamic keys/values, so you can't know at any given moment how many pairs you have and what (intended) type of values the keys would point to. Guess I'll have to pick one of the more hackish workarounds.

Apart from this I think your answer sums up conclusions to my question mostly: "The order depends on the type of the mapping, because sorting relies on how the data is indexed."

victorbadila commented 7 years ago

I think this issue should be closed, it is not a real issue with cassandra-lucene and workarounds have been provided in the comments.