elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.41k stars 24.56k forks source link

Error: failed to parse field [...] of type [date] when painless script updates unrelated field #108977

Open pmishev opened 3 months ago

pmishev commented 3 months ago

Elasticsearch Version

7.17.12

Installed Plugins

No response

Java Version

bundled

OS Version

Linux aa933ae49f18 5.15.49-linuxkit #1 SMP Tue Sep 13 07:51:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When trying to delete a field from an index, which contains an epoch_second date field with a decimal number, an unexpected error occurs on the date field and the deletion doesn't happen.

Seems like in a certain scenario, ES does not recognise decimal numbers in the scientific notation it itself saves the data in.

Steps to Reproduce

PUT /test_ts
{
  "mappings": {
    "properties": {
      "update_datetime" : {
        "type" : "date",
        "format" : "epoch_second"
      },
      "is_private" : {
        "type" : "boolean"
      }
    }
  }
}
POST test_ts/_doc/1
{
  "update_datetime": 1716462600.37034
}
POST test_ts/_update_by_query
{
  "script": {
    "source": "ctx._source.remove('is_private');",
    "lang": "painless"
  }
}

Results in:

failed to parse field [update_datetime] of type [date] in document with id '1'. Preview of field's value: '1.71646260037034E9'

Strangely reindexing works fine with no errors:

POST _reindex
{
  "source": {
    "index": "test_ts"
  },
  "dest": {
    "index": "test_ts_1"
  }
}

Logs (if relevant)

No response

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

rjernst commented 3 months ago

Can you please add error_trace=true to your test_ts/_update_by_query request? ie:

POST test_ts/_update_by_query?error_trace=true

That should give more details about where the error is actually occurring.

pmishev commented 3 months ago
{
  "took" : 5,
  "timed_out" : false,
  "total" : 1,
  "updated" : 0,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [
    {
      "index" : "test_ts",
      "type" : "_doc",
      "id" : "1",
      "cause" : {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse field [update_datetime] of type [date] in document with id '1'. Preview of field's value: '1.71646260037034E9'",
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : "failed to parse date field [1.71646260037034E9] with format [epoch_second]",
          "caused_by" : {
            "type" : "date_time_parse_exception",
            "reason" : "Failed to parse with all enclosed parsers"
          }
        }
      },
      "status" : 400
    }
  ]
}
rjernst commented 2 months ago

Thanks for the info, I see what is happening.

Your update_datetime is passed as a JSON number. When this is parsed in Java (as it is when reindexing), it is placed in a double type. When that double is serialized back out, it uses scientific notation. Yet the epoch_second date format can't handle scientific notation.

While understandably confusing, I think fixing this would be difficult. When reindexing we don't know about the mapped types when parsing the source, it's just a json object. It might be possible to rework reindexing to use the original source bytes, but not without a bit of rework.

One workaround that should work is to use a string. So when indexing your original document, try this:

POST test_ts/_doc/1
{
  "update_datetime": "1716462600.37034"
}

That should retain the orignal formatting when parsed as JSON, and then serialized again as a string to be reindexed.

pmishev commented 2 months ago

Thanks for the workaround. So far seems to work after fixing my existing data:

POST test_ts/_update_by_query
{
  "script": {
    "source": """
      if (ctx._source.update_datetime instanceof Double) {
        double updateDatetime = ctx._source.update_datetime;
        // Convert double to String
        String updateDateTimeString = updateDatetime + "";
        // Remove the E9 suffix
        updateDateTimeString = updateDateTimeString.splitOnToken('E')[0];
        // Remove decimal point
        String[] splitString = updateDateTimeString.splitOnToken('.');
        updateDateTimeString = splitString[0] + splitString[1];
        // Insert the decimal point in the correct place
        String part1 = updateDateTimeString.substring(0, 10);
        String part2 = updateDateTimeString.substring(10);
        ctx._source.update_datetime = part1 + "." + part2;
      }
    """
  }
}