Open honzakral opened 7 years ago
To be clear - this is not about using a scrolled query to fetch the aggregations bit by bit. This is about having a query with both aggregations and hits, and you want to use a scrolled query for the hits while still seeing the aggregations.
Currently, there's no way to do this: Search.execute()
doesn't do scrolled queries, while Search.scan()
only return an iterator over the hits with no way to access the aggregation results.
My proposal is to add a new method, Search.execute_scan()
, which returns a Response object like Search.execute()
, but the Response.hits
property, instead of being a static list, is a Search.scan()
-style iterator.
I'd also like to see this happen -- if this isn't a priority right now, do you have a suggested workaround for the time being @HonzaKral ?
Perhaps even with the underlying elasticsearch-py
package
I think the proper solution is to create a custom Response
class that will hide this - it will provide standard access to the aggregations but when iterating over it's .hits
attribute will iterate over all the documents (just like currently iterating over scan()
works). Exactly as @macdjord said!
This will make it compatible with the standard response.
I'm not sure if this should belong here, but the problem I am facing is more of the DSL library being unable to get us more than 10 aggregation results back. Please do correct me if I am wrong, but slicing seems to work only for hits rather than aggregations.
If the "scan" for aggregations can be implemented, I am sure it would be extremely helpful. Meanwhile, for any others who might be facing the problem of only 10 aggregation results in the DSL library, hopefully the workaround here can prove helpful in the meantime.
Perhaps we could also look at allowing pagination for aggregations?
@qiujunda scan
with aggregations still doesn't scan through the aggregations, it just runs the aggregations first and then proceeds to scan through the documents.
For most aggregations you can already set the size
parameter to get back more than 10
buckets, to paginate through all possible buckets you need to use composite aggregation which is still not supported in elasticsearch-dsl
unfortunately - https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-composite-aggregation.html
@HonzaKral is there any update on being able to return aggregates with the scan response?
This is a pretty big deal tbh. I would like to see this implemented.
Any update for this functionality to scan through aggregate result?
Same here!
Just to clarify this is not scanning through results of aggregation, just returning an aggregates first and then scanning through the documents.
To "scan" over aggregations you can use the composite
aggregation as shown here - https://github.com/elastic/elasticsearch-dsl-py/blob/master/examples/composite_agg.py
I'm not sure if the problem I'm facing is related to this, but I'm being unable to get the inner_hits from a scan response. Any suggestions would be appreciated.
Regards!
Any suggestions how to get the aggregations from the response returned by scan() ?
In
5.0
elasticsearch allows a search request with aggregations when usingscan/scroll
which we should expose.This has been moved here from
elasticsearch-py
- https://github.com/elastic/elasticsearch-py/issues/530