Stratio / cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Apache License 2.0
600 stars 171 forks source link

sort expression: applied only first field condition #273

Closed imhy closed 7 years ago

imhy commented 7 years ago

Hi! I`m trying to sort result by two fileds.

CREATE TABLE chain ( id uuid, chain_type int, task_id uuid, dep_branch text, arr_branch text, items text, speed_sort int, dep_branch_sort map<text,int>, arr_branch_sort map<text,int>, PRIMARY KEY (id) );

CREATE CUSTOM INDEX chain_index ON chain () USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'refresh_seconds': '1', 'schema': '{ fields: { id: {type: "uuid"}, chain_type: {type: "string"}, task_id: {type: "uuid"}, dep_branch : {type: "string"}, arr_branch : {type: "string"}, items: {type: "string"}, speed_sort: {type: "integer"}, dep_branch_sort: {type: "integer"}, arr_branch_sort: {type: "integer"} } }' };

SELECT id, speed_sort, dep_branch_sort FROM chain WHERE expr(chain_index, '{ filter: { type: "match", field: "dep_branch_sort$zho", value: 5 }, sort: [{field: "dep_branch_sort$rus", reverse: true},{field: "speed_sort"}] }');

But result sorting only by first condition field: "dep_branch_sort$rus". If I move {field: "speed_sort"} to first position, results will be sorted only by field: "speed_sort".

I`m trying on 3.0.10.3 and 3.10.0.

Regards, Sergei.

adelapena commented 7 years ago

The logic of sorting in collection values can be quite tricky. We have tried the following dataset and queries obtaining the expected results:

INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':0, 'b':1}, 0);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':1, 'b':0}, 1);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':0, 'b':1}, 2);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':1, 'b':0}, 3);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':0, 'b':1}, 4);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':1, 'b':0}, 5);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':0, 'b':1}, 6);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':1, 'b':0}, 7);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':0, 'b':1}, 8);
INSERT INTO k.chain(id, dep_branch_sort, speed_sort) VALUES (now(), {'a':1, 'b':0}, 9);

SELECT id, dep_branch_sort, speed_sort FROM k.chain WHERE expr(chain_index, '{
sort: [{field: "dep_branch_sort$a"},{field: "speed_sort"}]
}');

 id                                   | dep_branch_sort  | speed_sort
--------------------------------------+------------------+------------
 e1150460-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          0
 e11a0d70-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          2
 e1706ad0-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          4
 e4f68450-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          6
 e4f7bcd0-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          8
 e1179c70-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          1
 e11af7d0-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          3
 e4f599f0-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          5
 e4f76eb0-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          7
 e4f83200-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          9

SELECT id, dep_branch_sort, speed_sort FROM k.chain WHERE expr(chain_index, '{
sort: [{field: "dep_branch_sort$b"},{field: "speed_sort"}]
}');

 id                                   | dep_branch_sort  | speed_sort
--------------------------------------+------------------+------------
 e1179c70-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          1
 e11af7d0-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          3
 e4f599f0-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          5
 e4f76eb0-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          7
 e4f83200-f1e3-11e6-ae22-c108003b2d1a | {'a': 1, 'b': 0} |          9
 e1150460-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          0
 e11a0d70-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          2
 e1706ad0-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          4
 e4f68450-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          6
 e4f7bcd0-f1e3-11e6-ae22-c108003b2d1a | {'a': 0, 'b': 1} |          8

Please take a look to the provided example to check if this was the behaviour you were expecting. If this was the expected behaviour, and you are getting wrong results, please provide some example row insertions to reproduce the problem.

imhy commented 7 years ago

Hi! In Your example result is as expected for me. But on my test , i get wrong result.

SELECT id, speed_sort, dep_branch_sort FROM chain WHERE expr(chain_index, '{ filter: { type: "match", field: "dep_branch_sort$zho", value: 4 }, sort: [{field: "dep_branch_sort$rus", reverse: true},{field: "speed_sort"}] }');

id | speed_sort | dep_branch_sort --------------------------------------------------------+----------------+--------------------------------- 3bc07df5-f346-11e6-8d2c-dd3726f91cd2 | 6 | {'eng': 2, 'rus': 10, 'zho': 4} 3bc07df6-f346-11e6-8d2c-dd3726f91cd2 | 7 | {'eng': 6, 'rus': 8, 'zho': 4} 3bc07df7-f346-11e6-8d2c-dd3726f91cd2 | 6 | {'eng': 9, 'rus': 3, 'zho': 4} 3bc07df2-f346-11e6-8d2c-dd3726f91cd2 | 9 | {'eng': 5, 'rus': 2, 'zho': 4} 3bc07df0-f346-11e6-8d2c-dd3726f91cd2 | 2 | {'eng': 5, 'rus': 1, 'zho': 4}

I attached chain.csv in google drive with my data. https://drive.google.com/open?id=0B2j8ixtPgxmqUVJCREo1TC1TYnM

adelapena commented 7 years ago

Hi,

I don't see what is wrong in your result:

  1. Only the rows with dep_branch_sort$zho=4 are returned.
  2. The filtered rows are ordered first by dep_branch_sort$rus in reverse order: 10, 8, 3, 2, 1
  3. The rows with the same dep_branch_sort$rus value should be ordered by speed_sort. There are not rows with the same dep_branch_sort$rus value to be sorted, so the speed sort doesn't have any effect.

What in these steps is wrong for you?

The provided CSV doesn't contain column names, I've inferred the following relevant insertions:

INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df8-f346-11e6-8d2c-dd3726f91cd2, {'eng': 4, 'rus': 9, 'zho': 2}, 6);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df3-f346-11e6-8d2c-dd3726f91cd2, {'eng': 1, 'rus': 3, 'zho': 9}, 3);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df2-f346-11e6-8d2c-dd3726f91cd2, {'eng': 5, 'rus': 2, 'zho': 4}, 9);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df6-f346-11e6-8d2c-dd3726f91cd2, {'eng': 6, 'rus': 8, 'zho': 4}, 7);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df0-f346-11e6-8d2c-dd3726f91cd2, {'eng': 5, 'rus': 1, 'zho': 4}, 2);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df1-f346-11e6-8d2c-dd3726f91cd2, {'eng': 7, 'rus': 1, 'zho': 2}, 2);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df7-f346-11e6-8d2c-dd3726f91cd2, {'eng': 9, 'rus': 3, 'zho': 4}, 6);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df5-f346-11e6-8d2c-dd3726f91cd2, {'eng': 2, 'rus': 10, 'zho': 4}, 6);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bc07df4-f346-11e6-8d2c-dd3726f91cd2, {'eng': 8, 'rus': 1, 'zho': 5}, 4);
INSERT INTO chain (id, dep_branch_sort, speed_sort) VALUES (3bb5cf91-f346-11e6-8d2c-dd3726f91cd2, {'eng': 1, 'rus': 4, 'zho': 8}, 9);
imhy commented 7 years ago

Hi! Sorry, my mistake and my example also is ok. It working as expected.

adelapena commented 7 years ago

No problem then :)

imhy commented 7 years ago

Thank You for help))