I've recently started using the significant_terms aggregation with a nested field in my index, and I've noticed that the results are very similar to those of a standard terms aggregation. This leads me to believe that the background calculations for significance might not be working as expected with nested fields. The bg_count is 0 as shown in this bucket list.
To provide more clarity, I'm using the significant_terms aggregation as follows, where I'm filtering based on the pos field before performing the aggregation:
When using significant_terms on a nested field, especially after filtering by a nested field's value (like pos in my case), do I need to specify to Elasticsearch which field to use for the background search? I'd expect the background scan to consider the entire index without any filters applied. If so, how do I ensure this?
Is it mandatory for a field to be mapped as text for the significant_terms aggregation to work properly? Or is it sufficient if a field is only mapped as a keyword?
Initially, I mapped the field to .txt with only the keyword type. After conducting the significant_terms aggregation, I noticed that the terms returned were not as "significant" as I had expected. I began to wonder if this inconsistency was due to not mapping the field as text in addition to keyword. Hoping to get more relevant results, I made the change to include the text mapping. However, to my disappointment, this alteration didn't bring about any notable difference in the aggregation results.
Also applying this has no effect, bg_count is still 0:
"background_filter": {
"match_all": {}
}
Any insights or guidance on this would be greatly appreciated. Thanks in advance!
Elasticsearch Version
8.10.2
Installed Plugins
No response
Java Version
bundled
OS Version
Debian 6.1
Problem Description
Hi everyone,
I've recently started using the
significant_terms
aggregation with a nested field in my index, and I've noticed that the results are very similar to those of a standardterms
aggregation. This leads me to believe that the background calculations for significance might not be working as expected with nested fields. The bg_count is 0 as shown in this bucket list.Here's a simplified version of my index mapping:
To provide more clarity, I'm using the
significant_terms
aggregation as follows, where I'm filtering based on thepos
field before performing the aggregation:My primary questions are:
significant_terms
on a nested field, especially after filtering by a nested field's value (likepos
in my case), do I need to specify to Elasticsearch which field to use for the background search? I'd expect the background scan to consider the entire index without any filters applied. If so, how do I ensure this?text
for thesignificant_terms
aggregation to work properly? Or is it sufficient if a field is only mapped as akeyword
?Initially, I mapped the field to
.txt
with only thekeyword
type. After conducting thesignificant_terms
aggregation, I noticed that the terms returned were not as "significant" as I had expected. I began to wonder if this inconsistency was due to not mapping the field astext
in addition tokeyword
. Hoping to get more relevant results, I made the change to include thetext
mapping. However, to my disappointment, this alteration didn't bring about any notable difference in the aggregation results.Also applying this has no effect, bg_count is still 0:
Any insights or guidance on this would be greatly appreciated. Thanks in advance!
Steps to Reproduce
my_field
field structured as shown above.significant_terms
aggregation using nested and filtered queries on themy_field
field.bg_count
in the aggregation results.Logs (if relevant)
No response