Open honzakral opened 6 years ago
All the Search
methods have been implemented in #806, with one small exception - there is no step
parameter when to get_(next|previous)_page
since I didn't realize that _search
in elasticsearch doesn't support from
when using search_after
(I opened a ticket to address that - https://github.com/elastic/elasticsearch/issues/28068).
My question is whether it is OK like this or if we should implement the skip
functionality in python - when jumping with skip=3
, request 3*SIZE
documents and discard first 2*SIZE
. It is not the most efficient way of course but avoids deep pagination which is even worse.
What do people think?
While discarding unused docs is ok when skip=3
, it wouldn't be OK with skip=500
or skip=last-1
for example. Limiting the number of available pages (@jimczi's suggestion on https://github.com/elastic/elasticsearch/issues/28068) is not necessarily an option when access to the Last
page is a legitimate requirement.
Is there a way to resolve this efficiently? I thought that search_after
with from
would get us at least part way there. I'd be interested in an explanation for why this isn't a good idea.
@drpump thank you for the reaction!
There is currently no other way to do it efficiently, according to Jim, feel free to ask any questions on the elasticsearch ticket, but from what I understand the elasticsearch team is not super happy with search_after
performance and usability and might want to update its internals. That is why they don't want to commit to additional functionality in the feature right now.
last
page is not hard because we can invert the sort order (as the PR is doing) to jump there directly to the last page. We could also enforce the maximum number of pages to be skipped, cap it at 10
or some other small arbitrary number...
Thanks, so I have 3 solutions I could implement:
search_after
and do a lazy fetch of records in between (i.e. get first N, get last N, get next N if required, get next-to-last N etc). Some accuracy issues, but not significant for a large number of records. scroll
API and paginate on the array. In Rails, due to integration with ActiveRecord, I can retrieve each match from my DB rather than going back to ES. Has memory and latency implications for my app, although background fetch would probably make it perform OK. Again, some accuracy issues due to currency of scroll but not significant.All are client side solutions. I'd need to implement a new searcher class in Rails or monkey patch the elasticsearch gems. Doable if perhaps a bit messy.
When using search_after, we need to choose a unique sort key. And there was a little difficulty in choosing. Field _id is not recommended, it's not a doc_value field. And when shard is large, for example close to 50G. Sorting with _id leads to poor performance(comparing to default sort _doc). _doc is also not suitable for sorting either, for it is not unique for each doc.
Currently any of the pagination needs to be done manually either via slicing (which can be inefficient for deep pagination) or using
search_after
(0), which can be complex. What I propose is to introduce several new methods onSearch
objects:and helper methods on
Response
to retrievelast_hit
andfirst_hit
(self.hits[0/-1].meta.sort
) and also to directly use those to callget_next/previous_page
.0 - https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-request-search-after.html
Or do people think this should be a separate object/module altogether? is there anything I am missing? (number of pages? Direct jump to last/first page?