elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.06k stars 24.52k forks source link

[Transform] Potential regression in `8.12` related to how mismatching mappings are handled. #105229

Open przemekwitek opened 6 months ago

przemekwitek commented 6 months ago

Elasticsearch Version

8.12

Installed Plugins

No response

Java Version

bundled

OS Version

n/a

Problem Description

After upgrading from 8.11 to 8.12 some transforms have failed with the following message:

task encountered more than 10 failures; latest failure: Fielddata is disabled on [ece.runner] in [.ds-service-proxy-requests-filebeat-7.12.0-azure-eastus-2022.11.18-000332]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [ece.runner] in order to load field data by uninverting the inverted index. Note that this can use significant memory.

Such an error message means that the transform encountered a source index with mapping type for the ece.runner field being text rather than keyword. Interestingly, the index with this out-of-date mapping for ece.runner field existed for a long time (year 2022) but the transform was able to deal with it. This issue is to find out and potentially fix regression in 8.12 which made this transform start failing.

More details:

  1. The problem occurred despite transform having filter:

    "query": {
      "bool": {
        "must_not" : [
          { "term": { "_tier": { "value": "data_frozen" } } }
        ]
      }
    }

    That's because the offending index has never been moved to "frozen" but was "hot" when this problem occurred.

  2. When simply running the aggregation (no transform involved):

    GET /service-proxy-requests-*/_search
    {
    "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "_tier": {
              "value": "data_frozen"
            }
          }
        }
      ]
    }
    },
    "aggs": {
    "date_interval": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "5m"
      },
      "aggs": {
        "number_of_proxies": {
          "cardinality": {
            "field": "ece.runner",
            "precision_threshold": 1000
          }
        }
      }
    }
    },
    "size": 0
    }

    the result is:

    {
    "took": 4907,
    "timed_out": false,
    "_shards": {
    "total": 624,
    "successful": 576,
    "skipped": 486,
    "failed": 48,
    "failures": [
      {
        "shard": 0,
        "index": ".ds-service-proxy-requests-filebeat-7.12.0-aws-us-gov-east-1-2022.07.19-2022.11.19-000335",
        "node": "Sj3o1omwQleQBnIEtGWBHQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on [ece.runner] in [.ds-service-proxy-requests-filebeat-7.12.0-aws-us-gov-east-1-2022.07.19-2022.11.19-000335]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [ece.runner] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      },

so in a sense this behavior in transform could also be expected. But it worked just fine before the upgrade to 8.12.

  1. Discussed mitigations:

a) delete the indices which have out-of-date mapping and restart the transform This requires identifying and deleting the indices and then restarting the affected transforms.

b) filter out the offending indices using source.query. This requires update of each affected transform and does not guarantee that the problem will not happen in the future.

c) make the ece.runner field always render as keyword using source.runtime_mappings. This requires update of each affected transform but has a better chance of fixing the problem permanently. Don't know how using runtime field affects transform performance. Here is part of the relevant _update request:

"source": {
    "index": [
      "service-proxy-requests-*"
    ],
    "runtime_mappings": {
      "ece.runner": {
        "type": "keyword"
      }
    },
    "query": {
      "bool": {
        "must_not": [
          {
            "term": {
              "_tier": {
                "value": "data_frozen"
              }
            }
          }
        ]
      }
    }
  },

Steps to Reproduce

I wasn't able to reproduce the issue locally. For version:

I did not see a situation where a transform would just work properly with the out-of-date text mapping for the ece.runner field. But if it was a long-running transform then some additional validations could kick in after the upgrade. This needs investigation.

Logs (if relevant)

No response

elasticsearchmachine commented 6 months ago

Pinging @elastic/ml-core (Team:ML)