elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.83k forks source link

Nested terms and cardinality aggregation causes scoring exception #112975

Closed wiggzz closed 1 month ago

wiggzz commented 1 month ago

Elasticsearch Version

8.15

Installed Plugins

No response

Java Version

bundled

OS Version

Linux 3acef6594579 6.5.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Tue Jan 9 22:39:36 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Problem Description

Under specific conditions I am hitting a runtime_exception when using some specific aggregations. Namely, a terms and cardinality aggregation inside a terms aggregation inside a nested aggregation. See the reproduction steps below.

The error is:

{
  "error": {
    "root_cause": [
      {
        "type": "runtime_exception",
        "reason": "score for different docid, nesting an aggregation under a children aggregation and terms aggregation with collect mode breadth_first isn't possible"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "i6xLcTXyT1SrJ5LhtKyUjg",
        "reason": {
          "type": "runtime_exception",
          "reason": "score for different docid, nesting an aggregation under a children aggregation and terms aggregation with collect mode breadth_first isn't possible"
        }
      }
    ],
    "caused_by": {
      "type": "runtime_exception",
      "reason": "score for different docid, nesting an aggregation under a children aggregation and terms aggregation with collect mode breadth_first isn't possible",
      "caused_by": {
        "type": "runtime_exception",
        "reason": "score for different docid, nesting an aggregation under a children aggregation and terms aggregation with collect mode breadth_first isn't possible"
      }
    }
  },
  "status": 500
}

Steps to Reproduce

PUT http://localhost:9200/my_index
Content-Type: application/json

{
    "mappings": {
        "properties": {
            "tags": {
                "type": "nested",
                "properties": {
                    "key": {
                        "type": "keyword"
                    },
                    "value": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

### create doc
POST http://localhost:9200/my_index/_doc
Content-Type: application/json

{
    "tags": [
        {
            "key": "state",
            "value": "texas"
        }
    ]
}

### create another doc
POST http://localhost:9200/my_index/_doc
Content-Type: application/json

{
    "tags": [
        {
            "key": "state",
            "value": "utah"
        }
    ]
}

### create third doc
POST http://localhost:9200/my_index/_doc
Content-Type: application/json

{
    "tags": [
        {
            "key": "state",
            "value": "texas"
        }
    ]
}

### query (which produces an error)
POST http://localhost:9200/my_index/_search
Content-Type: application/json

{
    "aggregations": {
        "tags": {
            "nested": {
                "path": "tags"
            },
            "aggregations": {
                "keys": {
                    "terms": {
                        "field": "tags.key",
                        "execution_hint": "map"
                    },
                    "aggregations": {
                        "values": {
                            "terms": {
                                "field": "tags.value"
                            }
                        },
                        "value_count": {
                            "cardinality": {
                                "field": "tags.value"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

### delete index
DELETE http://localhost:9200/my_index

I can work around this by adding execution_hint: direct to the cardinality aggregation, but from reading the code this appears to be a bug since the MultiBucketCollector is defaulting to ScoreMode.COMPLETE even though the sub-aggregations do not need any scoring (here).

Logs (if relevant)

No response

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

iverase commented 1 month ago

Thank you for reporting an provide a clear reproduction. It made it very easy to find the issue and open a fix.