elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.1k stars 24.83k forks source link

getting sort field via script_fields results in array_index_out_of_bounds_exception for some values #99620

Open mad-pf opened 1 year ago

mad-pf commented 1 year ago

Elasticsearch Version

8.9.2

Installed Plugins

analysis-icu

Java Version

bundled

OS Version

as shipped in docker.elastic.co/elasticsearch/elasticsearch:8.9.2; Linux kernel 6.4

Problem Description

Getting the sort value of some simple text values (such as "Q" or "W") in a painless script results in an array_index_out_of_bounds_exception while the same request has no issues with other values such as "A" or "QQ".

Seems to work fine on ES 7.17, but is also broken on ES 8.5. analysis-icu plugin is required to reproduce the problem.

Steps to Reproduce

The following script creates a simple index, adds two objects and searches for these objects with script_fields. For the first object, it works fine, for the second object, an error is returned:

#!/bin/sh

ES=http://localhost:9202
ESIDX=debug_69726

curl -s -XDELETE $ES/$ESIDX >/dev/null

curl -s -XPUT $ES/$ESIDX?pretty=true -H 'Content-Type: application/json' -d '{
    "settings": {}
}'

curl -s -XPOST $ES/$ESIDX/_mapping?pretty=true -H 'Content-Type: application/json' -d '{
    "properties": {
        "name": {
            "type": "keyword",
            "fields": {
                "sort": {
                    "type": "icu_collation_keyword"
                },
                "text": {
                    "type": "text"
                }
            }
        }
    }
}'

curl -s -XPOST $ES/$ESIDX/_bulk?pretty=true\&refresh=true -H 'Content-Type: application/json' -d \
'{"index": {"_id": "foo:1"}}
{"name": "A"}
{"index": {"_id": "foo:2"}}
{"name": "Q"}
'

echo "foo:1, no problem"
curl -s -XPOST $ES/$ESIDX/_search?pretty=true -H 'Content-Type: application/json' -d '{
    "query": {"term": {"_id": "foo:1"}},
    "script_fields": {
        "_sort": {
            "script": {
                "lang": "painless",
                "source": "doc['\''name.sort'\''].value"
            }
        }
    }   
}'

echo "foo:2, explodes"
curl -s -XPOST $ES/$ESIDX/_search?pretty=true -H 'Content-Type: application/json' -d '{
    "query": {"term": {"_id": "foo:2"}},
    "script_fields": {
        "_sort": {
            "script": {
                "lang": "painless",
                "source": "doc['\''name.sort'\''].value"
            }
        }
    }   
}'

Result for the first object, no problem:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "debug_69726",
        "_id" : "foo:1",
        "_score" : 1.0,
        "fields" : {
          "_sort" : [
            "*\u0001\u0005\u0001܀"
          ]
        }
      }
    ]
  }
}

Result for the second object, the error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "org.apache.lucene.core@9.7.0/org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:649)",
          "org.apache.lucene.core@9.7.0/org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:136)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.bytesToString(ScriptDocValues.java:461)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:466)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:420)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)",
          "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)",
          "doc['name.sort'].value",
          "                ^---- HERE"
        ],
        "script" : "doc['name.sort'].value",
        "lang" : "painless",
        "position" : {
          "offset" : 16,
          "start" : 0,
          "end" : 22
        }
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "debug_69726",
        "node" : "PZ2dTLbxTWayGvJjxds3cg",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "script_stack" : [
            "org.apache.lucene.core@9.7.0/org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:649)",
            "org.apache.lucene.core@9.7.0/org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:136)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.bytesToString(ScriptDocValues.java:461)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:466)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$StringsSupplier.getInternal(ScriptDocValues.java:420)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:493)",
            "org.elasticsearch.server@8.9.2/org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:482)",
            "doc['name.sort'].value",
            "                ^---- HERE"
          ],
          "script" : "doc['name.sort'].value",
          "lang" : "painless",
          "position" : {
            "offset" : 16,
            "start" : 0,
            "end" : 22
          },
          "caused_by" : {
            "type" : "array_index_out_of_bounds_exception",
            "reason" : "Index 6 out of bounds for length 6"
          }
        }
      }
    ]
  },
  "status" : 400
}

Logs (if relevant)

No response

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

juntezhang commented 1 year ago

This also affects the sort when the field is multi-valued (array), as by default the sort mode is by max, so then this exception will be triggered when setting the field type to icu_collation_keyword.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)