elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.49k stars 24.88k forks source link

Fix sort on date fields with missing values (#73763) #116534

Open hanjongho opened 2 weeks ago

hanjongho commented 2 weeks ago

Elasticsearch Version

7.15.2

Installed Plugins

No response

Java Version

bundled

OS Version

centos 7

Problem Description

I'm trying to find and share the error in the logic of sort with missing, order method together in the date field. while using .missing("_last").sort(SortOrder.DESC) or .missing("_front").sort(SortOrder.ASC) in date field (doc does not have a date field) sort value return -9223372036854775808L by the following logic image

Steps to Reproduce

can simulate this case with this order

1. Put Index Settings

curl -X PUT "localhost:9200/my-index-000001" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "date_nanos_test": {
        "type": "date_nanos"
      },
      "date_test": {
        "type": "date"
      }
    }
  }
}
'

2. Bulk 2 documents that has date_nanos, date type each

curl -X PUT "localhost:9200/my-index-000001/_bulk?refresh" -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"date_test":"2024-11-01"}
{"index":{"_id":"2"}}
{"date_nanos_test":"2024-11-01T23:50:00.000000001"}
'

3. Query with .missing("_last").sort(SortOrder.DESC) in date_nanos field

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 5,
  "sort": [
    {
      "date_nanos_test": {
        "order": "desc",
        "missing": "_last"
      }
    }
  ]
}
'

3.1 Actual Result

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "date_nanos_test" : "2024-11-01T23:50:00.000000001"
        },
        "sort" : [
          1730505000000000001
        ]
      },
      {
        "_index" : "my-index-000001",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "date_test" : "2024-11-01"
        },
        "sort" : [
          0
        ]
      }
    ]
  }
}

this works well, fixed in https://github.com/elastic/elasticsearch/pull/74760

4. Query with .missing("_last").sort(SortOrder.DESC) in date field

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 5,
  "sort": [
    {
      "date_test": {
        "order": "desc",
        "missing": "_last"
      }
    }
  ]
}
'

4.1 Actual Result

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "date_test" : "2024-11-01"
        },
        "sort" : [
          1730419200000
        ]
      },
      {
        "_index" : "my-index-000001",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "date_nanos_test" : "2024-11-01T23:50:00.000000001"
        },
        "sort" : [
          -9223372036854775808
        ]
      }
    ]
  }
}

Logs (if relevant)

No response

Implement PR (fiexed)

https://github.com/elastic/elasticsearch/pull/116099

abhiHIS commented 2 weeks ago

Hello! I would like to work on this issue!

henningandersen commented 2 weeks ago

I believe the fix for dates_nanos was #74760. In that, it is mentioned that:

For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds

I wonder if you can elaborate on the problem you are facing when Long.MIN_VALUE is used as the sort value for millisecond resolution dates?

elasticsearchmachine commented 2 weeks ago

Pinging @elastic/es-search (Team:Search)

hanjongho commented 2 weeks ago

I believe the fix for dates_nanos was #74760. In that, it is mentioned that:

For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds

I wonder if you can elaborate on the problem you are facing when Long.MIN_VALUE is used as the sort value for millisecond resolution dates?

yes. comment is correct. it says it also include date field, but in the code, it only includes NumericType.DATE_NANOSECONDS. this is PR that I fixed it includes NumericType.DATE https://github.com/elastic/elasticsearch/pull/116099