Open matcho opened 2 months ago
Hi @matcho !
This behavior is expected since your index doesn't contain field _key
: Therefore it should be materialized directly from the storage. As a result, it is way slower as expected.
Could you please try to add _key
field to primarySort as well? Here is the documentation about how such view with 2 fields in primarySort could be created: https://docs.arangodb.com/3.11/index-and-search/arangosearch/performance/#primary-sort-order
Hi @alexbakharew
Thank you for your answer.
I understand that using a 2-field primarySort
would certainly work − we'll try that as soon as possible, although it seems that primarySort cannot be changed on an existing view : const error: /primarySort must be equal to constant. Schema: {"allowedValue":[{"field":"date_obs","asc":false}]}
, and view has to be re-created from scratch (no big deal).
What still seems strange is that returning a projection that includes _key
is very fast, although _key
is not an indexed field in the view : the following query runs in 165ms !
FOR o in obs_geo_view
SEARCH ANALYZER(GEO_CONTAINS(GEO_POLYGON(france), o.geoloc), "geopoint_pn")
SORT o.date_obs DESC
LIMIT 20000, 1000
RETURN {
date_obs: o.date_obs,
_key: o._key
}
Once the 1000 resulting items, including date_obs
and _key
fields, are returned so fast, what keeps AQL from re-sorting them in-memory ? Sorting 1000 items should take almost no time. In queries mentioned in the original post above, it seems that the optimizer forces the second SORT
statement to be executed before pagination, which is not what is expressed in the AQL query, especially query 3 with the sub-query.
So it still looks like an optimizer / AQL issue to me. We should be able to apply a secondary sort after pagination without modifying the view's primarySort
. Besides, what if we want to sort after pagination on different fields, in different cases ? We would then have to build multiple views with a different primarySort
in each of them, which doesn't seem reasonable.
Thank you
Hi @matcho!
This indeed looks a little bit strange. I will investigate it and then get back to you.
My Environment
Component, Query & Data
Affected feature: AQL query using ArangoSearch view.
We are in a case where we have to paginate results of an ArangoSearch view according to its primary sort field/order, which is very efficient, but also have to apply a secondary sort on
_key
to ensure results order, as the primary sort may have identical values in the output.Trying to apply this secondary sort after pagination to keep the excellent performance given by the view's primary sort does not work.
AQL query (if applicable):
common part of all queries: a polygon roughly representing France
query 1 (quick)
query 2 (slow)
query 3 (still slow)
query 4 (even way slower but this is expected)
AQL explain and/or profile (if applicable):
profile query 1
profile query 2
profile query 3
Dataset: ~22M in collection
observations
feeding the viewobs_geo_view
ArangoSearch view description :Size of your Dataset on disk: 15GB for collection
observations
feeding the viewSteps to reproduce
EnumerateViewNode
has a lot moreitems
than it should, andremove-redundant-calculations
rules is applied (maybe removes the first view-optimized sort statement ?)Problem: Extra sort after pagination should not break performance of steps before
Expected result: Pagination is executed quickly as expected thanks to the view primary sort, and extra sort is applied only on returned elements
Thank you