elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.55k stars 24.61k forks source link

Inconsistent behaviour for numeric term/terms query when values are out of range #112711

Open ioanatia opened 1 week ago

ioanatia commented 1 week ago

Elasticsearch Version

main

Installed Plugins

No response

Java Version

bundled

OS Version

-

Problem Description

NumberFieldMapper has inconsistent validations for out of range values for term queries.

I propose we refactor NumberFieldMapper such that we remove these validations for out of range values for term and terms queries. When an out of range value is sent for a term query we can return a MatchNoDocsQuery. We can figure out a similar optimization for Terms queries.

Steps to Reproduce

These steps are just for double and integer fields - but the issue has to do with all numeric field types. Either we consistently return an error for all out of range values for all numeric types or we remove the validation.

  1. Create an index with the following mapping:
PUT /testidx
{
  "mappings": {
    "properties": {
      "integer": {
        "type": "integer"
      },
      "double": {
        "type": "double"
      }
      }
  }
}
  1. Test terms queries for integer:
POST testidx/_search
{
  "query": {
    "term": {
      "integer": {
        "value": 1E300
      }
    }
  }
}

response:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: Value [1.0E300] is out of range for an integer",
        "index_uuid": "WTo8jWTvQcOek6avtN_jyg",
        "index": "testidx"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "testidx",
        "node": "z20YOLPXR9yKINaIyHC-3g",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: Value [1.0E300] is out of range for an integer",
          "index_uuid": "WTo8jWTvQcOek6avtN_jyg",
          "index": "testidx",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Value [1.0E300] is out of range for an integer"
          }
        }
      }
    ]
  },
  "status": 400
}

Test with another out of range value

POST testidx/_search
{
  "query": {
    "term": {
      "integer": {
        "value": 1E3000
      }
    }
  }
}

response:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
  1. Test for double field:
POST testidx/_search
{
  "query": {
    "term": {
      "double": {
        "value": 1E3000
      }
    }
  }
}

response:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: [double] supports only finite values, but got [Infinity]",
        "index_uuid": "WTo8jWTvQcOek6avtN_jyg",
        "index": "testidx"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "testidx",
        "node": "z20YOLPXR9yKINaIyHC-3g",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: [double] supports only finite values, but got [Infinity]",
          "index_uuid": "WTo8jWTvQcOek6avtN_jyg",
          "index": "testidx",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "[double] supports only finite values, but got [Infinity]"
          }
        }
      }
    ]
  },
  "status": 400
}

Impact on ES|QL

https://github.com/elastic/elasticsearch/issues/105079 reported a similar issue but for range queries.

These ES|QL queries produce no errors since we fixed https://github.com/elastic/elasticsearch/issues/105079:

  FROM testidx |
  WHERE half_float_field <= 1E300

  FROM testidx |
  WHERE half_float_field >= 1E300

But when we have: FROM testidx | WHERE half_float_field == 1E300 this will return an error since this ES|QL query is translated to a term query. This is an additional argument for removing the validations for out of range values for term and terms queries, since this would make the behaviour consistent with range queries.

Logs (if relevant)

No response

elasticsearchmachine commented 1 week ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine commented 1 week ago

Pinging @elastic/es-search (Team:Search)

ioanatia commented 6 days ago

It looks like we have added validations for finite values with https://github.com/elastic/elasticsearch/issues/25534 - the issue had to with disallowing out of range values at indexing time, which makes perfect sense - but the fix also added the restriction to range, term and terms queries. I am not sure whether this was the intention or a side effect - the tests don't seem to cover the effect on queries.