Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots)

lucabelluccini commented 2 years ago

Elasticsearch Version

7.14.2, 7.17.0 (probably earlier too)

Installed Plugins

No response

Java Version

bundled

OS Version

not relevant

Problem Description

If a user accidentally ingests JSON documents which have weird/malformed bodies, the generated mappings due to dynamic mapping will make the snapshot of the index fail on restore.

This can happen also during ILM (when the index is moved to mounted phases when using searchable snapshots) or during a normal snapshot restore operation.

Steps to Reproduce

DELETE myverybadindex

PUT myverybadindex
{
  "mappings": {
    "properties": {
      "query": {
        "properties": {
          "1": {
            "type": "text"
          },
          "\u0000": {
            "type": "text"
          },
          "\u0000\u0000\u0000\u0000": {
            "type": "text"
          }
        }
      }
    }
  }
}

POST _snapshot/found-snapshots/myverybadsnapshot
{
  "indices": "myverybadindex",
  "include_global_state": false
}

GET _snapshot/found-snapshots/myverybadsnapshot

POST /_snapshot/found-snapshots/myverybadsnapshot/_mount?wait_for_completion=true
{
  "index": "myverybadindex", 
  "renamed_index": "myverybadindex-mounted",
  "ignore_index_settings": [ "index.refresh_interval" ] 
}

POST /_snapshot/found-snapshots/myverybadsnapshot/_restore
{
  "indices": "myverybadindex",
  "rename_pattern": "(.+)"
  , "rename_replacement": "restored_index_$1"
}

Both mounting and restore operations end up with:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "i_o_exception",
        "reason" : "Duplicate field '\u0000'\n at [Source: (org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat$DeserializeMetaBlobInputStream); line: -1, column: 431]"
      }
    ],
    "type" : "i_o_exception",
    "reason" : "Duplicate field '\u0000'\n at [Source: (org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat$DeserializeMetaBlobInputStream); line: -1, column: 431]"
  },
  "status" : 500
}

This can happen also during ILM (when the index is moved to mounted phases when using searchable snapshots).

Logs (if relevant)

No response

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)

kunisen commented 2 years ago

[1]

I wonder if we have an easy way to fix this issue?

IIUC, we need to

find the index having bad mapping
delete those bad indices
take snapshot again
mount or restore

Is this enough and is there a good way to find bad mapping indices? (I wonder if it's not only limited to Unicode, but may expand to more patterns, which might be not that easy to check until we got rejected by the failure.)

[2]

I also feel, it might be great if we block it at the index creation stage, because it doesn’t really make logical sense to make it "OK to snapshot" but "NG to restore/mount".

Had a chat with @Leaf-Lin, the reason behind this seems to be it's not great to prevent users from creating fields that are based on Unicode, because users in a different language would have fields that are completely normal to them, but ES is unable to process it correctly.

However, given it's causing discrepancy behavior in “index creation” and “snapshot/restore”, which probably ideally best to get things aligned.

Is there a way to across this? e.g. make an "encoding logic" internally to avoid using Unicode directly? (like URL encoding is widely used in lots of applications)

DaveCTurner commented 2 years ago

This seems to be a SMILE bug, or at least something that's not supported properly in SMILE. The following test fails for SMILE (and CBOR) but passes for JSON and YAML.

diff --git a/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java b/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
index 96b93568c66..9ab3cce8aa8 100644
--- a/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
+++ b/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
@@ -140,6 +140,7 @@ public abstract class BaseXContentTestCase extends ESTestCase {
         expectUnclosedException(() -> BytesReference.bytes(builder().startObject().field("foo")));

         assertResult("{'foo':'bar'}", () -> builder().startObject().field("foo").value("bar").endObject());
+        assertResult("{'\\u0000':'','\\u0000\\u0000':''}", () -> builder().startObject().field("\0", "").field("\0\0", "").endObject());
     }

     public void testNullField() throws IOException {

The trouble is that the SMILE parser treats these field names as short ASCII strings which get cached to avoid unnecessary instantiation, but the cache is keyed by an integer representation of the string and both of these strings map to 0.

I don't think this is a general Unicode problem, it's only going to affect field names that are made up of some short sequence of NUL bytes. I have reported this at https://github.com/FasterXML/jackson-dataformats-binary/issues/312.

Can we perhaps forbid field names containing NUL bytes entirely? Are they ever anything but a mistake?

lucabelluccini commented 2 years ago

Thank you David for the prompt analysis.

Can we perhaps forbid field names containing NUL bytes entirely? Are they ever anything but a mistake?

It could be a nice feature. Most times I've seen it was a client (Fluentd or other products) trying to index garbage data. I would always allow an escape hatch (an index setting or a cluster setting ?).

SharpEdgeMarshall commented 2 years ago

Any news on this? we have a not restorable index cause of this bug

DaveCTurner commented 2 years ago

The Jackson bug is fixed upstream, but a fixed version (≥2.14.0) is yet to be released.

javanna commented 1 month ago

Starting from Elasticsearch 8.6 we upgraded jackson to 2.14. This should be fixed now.

elastic / elasticsearch