elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.58k stars 24.36k forks source link

No default value used for `weight` in weighted average aggregator #81309

Open salvatore-campagna opened 2 years ago

salvatore-campagna commented 2 years ago

Elasticsearch version: 8.0, 7. 16, 7.15 and older versions.

Description of the problem including expected versus actual behaviour:

The documentation about missing values for the weighted average aggregation (https://www.elastic.co/guide/en/elasticsearch/reference/7.15/search-aggregations-metrics-weight-avg-aggregation.html#_missing_values_4) says: "By default, if the value field is missing the document is ignored and the aggregation moves on to the next document. If the weight field is missing, it is assumed to have a weight of 1 (like a normal average)."

Anyway, if no missing configuration is provided for the weight field, documents without a weight field are ignored. As a result, it looks like that missing weights are treated the same way as missing values (ignoring documents) while the expected behaviour is to compute the weighted average using a default weight of 1.

Steps to reproduce:

This is a YAML REST test which could be used to reproduce the issue:

setup:
  - do:
      indices.create:
        index: test
        body:
          settings:
            number_of_shards: 1
            number_of_replicas: 0
          mappings:
            properties:
              long_field:
                type: long

  - do:
      bulk:
        refresh: true
        index: test
        body:
          - '{ "index": {} }'
          - '{ "long_field": 6}'
          - '{ "index": {} }'
          - '{ "long_field": 2 }'
          - '{ "index": {} }'
          - '{ "long_field": 1 }'
          - '{ "index": {} }'
          - '{ "long_field": 3 }'

---
"Missing weight with default value":
  - skip:
      features: close_to
  - do:
      search:
        body:
          aggs:
            weighted_long_avg:
              weighted_avg:
                value:
                  field: long_field
                weight:
                  field: weight

  - match: { hits.total.value: 4 }
  - length: { hits.hits: 4 }
  - close_to: { aggregations.weighted_long_avg.value: { value: 3.000000, error: 0.000001 } }

The expected result is ((6 1) + (2 1) + (1 1) + (3 1) / (1 + 1 + 1 + 1)) = 12 / 4 = 3. Instead it returns null since all documents are skipped due to the missing weight field.

elasticmachine commented 2 years ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

salvatore-campagna commented 2 years ago

According to my investigation this is happening because the second condition (on docWeights) in the if statement below is returning false (WeightedAvgAggregator::collect):

if (docValues.advanceExact(doc) && docWeights.advanceExact(doc)) {

If I understand correctly the issue is that the document has no weight field. As a result, it has no corresponding doc_values for that field and a specific doc. For this reason the whole computation is skipped, actually resulting in the document being ignored.