elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.69k stars 24.66k forks source link

Ascending sort with `missing _first` fails on datefields with missing values #81960

Open stu-elastic opened 2 years ago

stu-elastic commented 2 years ago

Elasticsearch version (bin/elasticsearch --version): v8.1.0, v7.16.2 and at least v7.15.1

Description of the problem including expected versus actual behavior:

Indexing a document with a missing date time value, then ascending sorting it with "missing": "_first" results in Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4 if it would be the only document returned, ie size: 1.

The formatter is trying to format the sentinel value of -9223372036854775808.

Steps to reproduce:

PUT test
{
  "mappings" : {
    "properties" : {
      "field1" : {
        "type" : "integer"
      },
      "dt" : {
        "type" : "date",
        "format" : "strict_date_time||strict_date_time_no_millis"
      }
    }
  }
}

POST _bulk
{"index":{"_index":"test","_id":"1"}}
{"field1": 1243, "dt": "2021-12-20T23:14:20+00:00"}
{"index":{"_index":"test","_id":"2"}}
{"field2": 4567}

GET test/_search
{
  "size": 1,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "dt": {
        "missing": "_first",
        "order": "asc"
      }
    }
  ]
}

This is https://github.com/elastic/elasticsearch/issues/73763 with targetNumericType == NumericType.DATE

Using "missing": 0 works around the issue.

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

idobrodushniy commented 1 year ago

Additional details about the issue, since I faced the same problem ⬇️

TLDR This problem ⬆️ happens as a consequence of the combination of following factors:

Workaround You can use epoch_second as a format for your sorting. (or any other format that would work for you) E.g.

"dt": {
        "format": "epoch_second
        "missing": "_first",
        "order": "desc"
}

Platforms I managed to reproduce this issue both on ES 8.1.3 and ES 7.14.1.

My conclusion 🐛

Details 🔍 If strict_date_time format specified either in mapping or explicitly in sort field, then all null values will be replaced with inexistent datetime(all docs will have the same one) by ES automatically. (E.g. "-292275055-05-16T16:47:04.192Z")

Then, ES will try to format these datetime to return it in sort of every document. As a result, parsing will end up throwing an error (please see the stack trace below) and returns an error response with a message Field Year cannot be printed as the value ... exceeds the maximum print width of 4.

This is the log I have in my ES docker container ⬇️ (what is interesting though, is that in version 8.1.3, in comparison with version 7.14.1, this is not an error but debug log).

{
    "@timestamp": "2022-10-26T23:09:11.996Z",
    "log.level": "DEBUG",
    "message": "[hvB72OhCQ1mmJsmHNzQ0_w][test][8]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[test], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={\"size\":100,\"_source\":{\"includes\":[\"_id\"],\"excludes\":[]},\"sort\":[{\"closed_datetime\":{\"order\":\"desc\"}},{\"id.exact\":{\"order\":\"desc\"}}]}}] lastShard [true]",
    "ecs.version": "1.2.0",
    "service.name": "ES_ECS",
    "event.dataset": "elasticsearch.server",
    "process.thread.name": "elasticsearch[99fcaa518426][search][T#1]",
    "log.logger": "org.elasticsearch.action.search.TransportSearchAction",
    "elasticsearch.cluster.uuid": "aQO5AwL1QXyYiy9FIbz8uA",
    "elasticsearch.node.id": "hvB72OhCQ1mmJsmHNzQ0_w",
    "elasticsearch.node.name": "99fcaa518426",
    "elasticsearch.cluster.name": "docker-cluster",
    "error.type": "java.time.DateTimeException",
    "error.message": "Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4",
    "error.stack_trace": "java.time.DateTimeException: Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4\n\tat java.base/java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2802)\n\tat java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2411)\n\tat java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2411)\n\tat java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1853)\n\tat java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1827)\n\tat org.elasticsearch.common.time.JavaDateFormatter.format(JavaDateFormatter.java:241)\n\tat org.elasticsearch.search.DocValueFormat$DateTime.format(DocValueFormat.java:288)\n\tat org.elasticsearch.search.DocValueFormat$DateTime.format(DocValueFormat.java:217)\n\tat org.elasticsearch.search.SearchSortValuesAndFormats.<init>(SearchSortValuesAndFormats.java:35)\n\tat org.elasticsearch.action.search.BottomSortValuesCollector.consumeTopDocs(BottomSortValuesCollector.java:65)\n\tat org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:125)\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:323)\n\tat org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:33)\n\tat org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:18)\n\tat org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:61)\n\tat org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:25)\n\tat org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)\n\tat org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:642)\n\tat org.elasticsearch.transport.TransportService$4.handleResponse(TransportService.java:718)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1339)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1417)\n\tat org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1397)\n\tat org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:41)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:38)\n\tat org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:19)\n\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n"
}
howardhuanghua commented 1 year ago

Hi @stu-elastic @kpollich , do we have plan to fix this issue? Or related PR has already fixed it? Thanks.

nemphys commented 1 year ago

+1 on this one, still happens in 8.8.2.

mkhludnev commented 9 months ago

Users might apply any of strict_* format for sort clause. It should fix the error.

benwtrent commented 2 months ago

I have tried in the latest version of Elasticsearch, and it sorts just fine, where the doc with the malformed value is populated to the top.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "test",
        "_id": "2",
        "_score": null,
        "_source": {
          "field2": 4567
        },
        "sort": [
          -9223372036854776000
        ]
      }
    ]
  }
}

Now, adjusting the request, I do get an error, but it sort of makes sense to me...

GET test/_search
{
  "size": 2,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "dt": {
        "format": "strict_date_time",
        "missing": "_first",
        "order": "asc"
      }
    }
  ]
}

You are trying to format the smallest possible date time and it just fails.

Maybe I don't know the desired behavior here. Should it just pick the smallest date that fits the format?

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)