Open micpalmia opened 9 years ago
Nice demonstration. I confirm that this is a bug.
+
@clintongormley Hi, couldn't U please update on the status of this issue, we've run into it as well, running the query on multiple shards, using dfs_query_then_fetch. Also, while trying to understand the cause I've encountered this post on elastic discuss: question
I also ran into this bug. It would be great if it could be solved.
I have also ran into this issue.
cc @elastic/es-search-aggs
Facing the same issue -- any updates on this ?
I can confirm that the implementation of this feature (BlendedTermQuery) does indeed not take distributed stats into account.
Note that the cross_fields type blends field statistics in a way that does not always produce well-formed scores (for example scores can become negative). As an alternative, you can consider the combined_fields query, which is also term-centric but combines field statistics in a more robust way.
While combined_fields did not use to work with dfs_query_then_fetch either, this has been fixed (to be released in an upcoming ES version).
Pinging @elastic/es-search (Team:Search)
Pinging @elastic/es-search-relevance (Team:Search Relevance)
I tested this on Elasticsearch 1.5.0 (and on ES 1.4.2 and on 1.3.0)
The documentation for the
multi_match
query of typecross_fields
states thatThis holds true when executing a query of this type with search type
query_then_fetch
: in this case, the same (approximated, shard-level) idf is used for all fields. When using search typequery_dfs_then_fetch
, on the other hand, the specific field idf is used.I would expect the scatter phase to provide to the multi match query the right merged idf, and not to completely overwrite the (approximated but still more correct) shard level idf with global field-specific idfs.
An obvious test is provided in the following gist https://gist.github.com/micpalmia/c812200617307d78d495
A series of documents are inserted in one shard only, and when a
cross_field
query is executed withquery_dfs_then_fetch
, unmerged idfs are used for the two fields.