cakephp / elastic-search

Elastic search datasource for CakePHP
Other
88 stars 53 forks source link

Allow fetching large datasets using ScrollAPI #96

Closed Antoniossss closed 4 years ago

Antoniossss commented 8 years ago

I have particular usecase that I have to fetch all the IDs from the elasticsearch and store it in file. I cannot do that (over 200k results) using normal result window aka "result paging" and I need scroll API for this. I have found out a workaround to that, but it is bypassing ORMs hydration and stuff and I cannot for example use map reduction in "cake fashion" . Here is what I have did:

       $filtered = $this->createFilteredQuery($data);
        $search = $myidxType->connection()->getIndex("myidx")->createSearch($filtered);
        $search->setOption("size","10000");
        $search->getQuery()->setFields(["_id"]);
        $scroll = $search->addType("myidx")->scroll('10s');
        $ids = [];
        do {
            $scroll->next();
            $results = $scroll->current();
//            $part = $results->reduce(function ($ids, $element) {
//                $ids[] = $element->id;
//                return $ids;
//            }, []);
//            $ids = array_merge($ids, $part);
        } while ($results->count() > 0);

It would be nice to expose some sort of iterator that would use scrollAPI under the hood.

PhantomWatson commented 5 years ago

The above example appears to use methods that are no longer supported.

I'm also in the position where I need to be able to paginate an unlimited number of results, and that doesn't appear to be possible with the current plugin without increasing the index.max_result_window Elasticsearch setting (which still can't technically achieve an unlimited number of results, just a really big number).

It would be really advantageous for this plugin to support the Elasticsearch scroll API or searchAfter parameter to address this issue.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open for 120 days with no activity. Remove the stale label or comment or this will be closed in 15 days

andrii-pukhalevych commented 3 years ago

It can be implemented in way similar to disableBufferedResults.

If buffering is disabled, use Elasticsearch scroll API - its differs from regular Elasticsearch queries just in additional parameter.