elastic / elasticsearch-dsl-py

High level Python client for Elasticsearch
http://elasticsearch-dsl.readthedocs.org
Apache License 2.0
3.82k stars 801 forks source link

.count() can return a different result when search has already been executed #1900

Open knoebber opened 1 week ago

knoebber commented 1 week ago

Reproduction:

>>> from elasticsearch_dsl import Search
>>> search_with_post_filter = Search(index='product').post_filter('terms', categories=['B'])
>>> search_with_post_filter.count()
2818
>>> search_with_post_filter.execute()
<Response: [...]>
>>> search_with_post_filter.count()
1

The first count() call does a normal count request, which does not respect post filters. The second .count() call uses the cached response's total hits property, which does respect post filters.

I see that this is somewhat intentional based on the comments from #518 . But, In my opinion, this behavior is a bug because the value of a method call changes depending on hidden internal state, which doesn't align with this libraries immutable API design.

I'm not sure what the ideal behavior would be - maybe never used the cached result for .count() if the search has a post filter?

miguelgrinberg commented 1 week ago

Thanks, I'll think about what can be done here. I actually think the best solution was the way it was before #518, which is to not allow count() to work if there is a post filter. That is the most consistent behavior.

Why don't you just separate the two queries? That solves your problem I think?

>>> from elasticsearch_dsl import Search
>>> search = Search(index='product')
>>> search_with_post_filter = search.post_filter('terms', categories=['B'])
>>> search_with_post_filter.execute()
<Response: [...]>
>>> search.count()
2818

This is the best solution. Both queries are cached, and both return accurate results no matter how many times you ask and no matter the order of execution.

knoebber commented 1 week ago

Yeah, I agree it's easy to work around, and it's not causing any problems on the code base that I'm working on. That snippet of code was just meant to highlight the issue.

Thanks for the response!