Open zengzh opened 7 years ago
Hi @zengzh:
As stated in doc SCLI supports CQL paging.
In your use case, the match query acts as a 'boolean' relevance (it matches or not) query. It does not make sense to sort them by relevance. Maybe searching documentation should help you to understand this.
Hope this helps
Thanks @ealonsodb for quick reply.
Sorry for the inappropriate example. Maybe a better one is the following:
SELECT * FROM tweets WHERE expr(tweets_index, '{
query: {type: "phrase", field: "body", value: "big data gives organizations"},
**limit:{offset:"20", pagesize:"80"}**
}');
According to CQL paging, paging on displays query results in 100-line chunks followed by the more prompt. This functionality is limited in 2 aspects:
Any ways to break the above limitations?
Execute PAGING 50
in cqlsh and see what happens!!
Indeed the 100 page size is a cqlsh.py variable you can change
The query you are executing is a relevance query, so results from different cassandra nodes must be sorted in coordinator node. What i mean, even providing an offset, there is no way to know the starting point in each node data subset(so, it is compulsory to execute that first page query and discard those results).
Paging functionality is covered by CQL paging and you can very easily skip whatever results you want in client.
Hope this helps
Thanks @ealonsodb It surprises me that the official document does not mention page size can be customized.
Cassandra supports paging but does not encourage offset queries .
I understand that even providing an offset in SCLI, it still needs to compute the first page and discard those results (keys). But, this avoids to retrieve the whole set of tuples from Cassandra and discard them. To this point of view, computing and discarding results from SCLI instead of computing/discarding tuples from Cassandra is helpful, right?
Hi @zengzh: You are totally right. Thank you for changing our mind about this feature. We have coded in #342 Could you please take a look?
Thanks @ealonsodb. I see that you mentioned skip "is not compatible with paging or top-K queries". Can you explain why is that? Did you add any validation check? If so, what it is?
Hi @zengzh: The main problem with paging and topK queries is that cassandra resolve inconsistencies between different nodes data in coordinator after any 2i related functions. If the 2i skips some rows, deterministic behaviour(to see the same results in the same order in different executions of the same query) may be lost.
Hope this helps
Hi @ealonsodb: Sorry that I do not fully understand. What are 2i related functions? Can you give an example of paging or top-k queries that return non-deterministic results because of skip? If I specify the sorting field, will the results still be non-deterministic? Thanks very much!
Hi @zengzh:
When querying our product you can use query or filter.
There is plenty of information at internet searching by: "lucene query versus filter".
The main problem is that data consistency in executed after 2i related sorting postProcess. The second problem is that this case is strange and does not happen in stable cluster. What i mean here is that skip would works well if every node is up and data consistency between nodes is correct but will start to fail if there are some data inconsistencies.
Hope this helps
Thanks @ealonsodb So better resolve inconsistencies before using skip.
Hi @zengzh Woaw. i have never thinked about it in that way. Give some time to test it deeply and maybe, with a big experimental warning about it I will merge.
Thank you for change my mind
Thanks @ealonsodb May I know when this skip feature will be merged into the release version?
Hi @ealonsodb @adelapena ,
It has been a while since this feature had been developed but remained unreleased. May I know the latest status and when it will be available?
Look forward to your reply. Many thanks.
Hi @ealonsodb @adelapena, do you have plans to merge this feature soon? we are excited and impatient about this, Thank you a lot! 😬
Hi @ealonsodb
ElasticSearch accepts “from” and “size” parameters so that users can retrieve certain number of results starting from a particular position. https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html
Does SCLI have this feature? For example, can I issue a query as follows:
Which retrieves the tweets about FIFA that are returned in 100 tweets/page and skip the first 100 tweets? If not, does stratio folks have plan to support this? Thanks.