elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.4k stars 24.56k forks source link

Alias field type is not handled well in DF analytics. #50787

Closed przemekwitek closed 4 years ago

przemekwitek commented 4 years ago

With the recent change (https://github.com/elastic/elasticsearch/pull/50219) that copies the mapping type from dependent variable to prediction field, QA regression has been found.

Wei posted the bug description: The failure is related to handling DFA field alias. the failed test is to run an analytics against a field with alias, it passed on master before Dec11, and failed recently with this error:

"failure_reason" : """[dfa_wine_quality_red_alias_1578498260_000_0] Failed to join results: failures while writing results [failure in bulk execution:
[0]: index [dest_wine_quality_red_alias_1578516260980], id [-BZHR2wB9mzBfTtIWLPc], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]

This is the job configuration:

{
  "id": "dfa_breast-cancer-alias_1578499743_000_0",
  "source": {
    "index": [
      "breast-cancer-alias"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest_breast_cancer_alias_1578517743585",
    "results_field": "ml"
  },
  "analysis": {
    "classification": {
      "dependent_variable": "class_alias",
      "num_top_classes": 2,
      "prediction_field_name": "class_alias_prediction",
      "training_percent": 100,
      "randomize_seed": 4381108523829301000
    }
  },
  "analyzed_fields": {
    "includes": [],
    "excludes": [
      "class",
      "breast-quad"
    ]
  },
  "model_memory_limit": "1gb",
  "create_time": 1578517744361,
  "version": "8.0.0",
  "allow_lazy_start": false
}

This is the mapping of breast-cancer-alias index:

"mappings" : {
      "properties" : {
        "age" : {
          "type" : "keyword"
        },
        "breast" : {
          "type" : "keyword"
        },
        "breast-quad" : {
          "type" : "keyword"
        },
        "breast-quad_alias" : {
          "type" : "alias",
          "path" : "breast-quad"
        },
        "class" : {
          "type" : "keyword"
        },
        "class_alias" : {
          "type" : "alias",
          "path" : "class"
        },
        "deg-malig" : {
          "type" : "long"
        },
        "inv-nodes" : {
          "type" : "keyword"
        },
        "irradiat" : {
          "type" : "keyword"
        },
        "menopause" : {
          "type" : "keyword"
        },
        "node-caps" : {
          "type" : "keyword"
        },
        "tumor-size" : {
          "type" : "keyword"
        }
      }
    },
elasticmachine commented 4 years ago

Pinging @elastic/ml-core (:ml)

przemekwitek commented 4 years ago

I was able to reproduce the issue locally, using integration test. The failure is:

  2> java.lang.AssertionError: 
    Expected: is null
         but: was "[dependent_variable_of_type_alias] Failed to join results: failures while writing results [failure in bulk execution:\n[0]: index [dependent_variable_of_type_alias_source_index_results], id [h5oMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[1]: index [dependent_variable_of_type_alias_source_index_results], id [hpoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[2]: index [dependent_variable_of_type_alias_source_index_results], id [i5oMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[3]: index [dependent_variable_of_type_alias_source_index_results], id [iJoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[4]: index [dependent_variable_of_type_alias_source_index_results], id [iZoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[5]: index [dependent_variable_of_type_alias_source_index_results], id [ipoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[6]: index [dependent_variable_of_type_alias_source_index_results], id [j5oMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[7]: index [dependent_variable_of_type_alias_source_index_results], id [jJoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[8]: index [dependent_variable_of_type_alias_source_index_results], id [jZoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[9]: index [dependent_variable_of_type_alias_source_index_results], id [jpoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]\n[10]: index [dependent_variable_of_type_alias_source_index_results], id [kJoMim8BRLpRVR6aNCHK], message [org.elasticsearch.index.mapper.MapperParsingException: failed to parse]]"
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.assertIsStopped(MlNativeDataFrameAnalyticsIntegTestCase.java:188)
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.lambda$waitUntilAnalyticsIsStopped$0(MlNativeDataFrameAnalyticsIntegTestCase.java:139)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:879)
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.waitUntilAnalyticsIsStopped(MlNativeDataFrameAnalyticsIntegTestCase.java:139)
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.waitUntilAnalyticsIsStopped(MlNativeDataFrameAnalyticsIntegTestCase.java:135)
        at org.elasticsearch.xpack.ml.integration.ClassificationIT.testDependentVariableOfTypeAlias(ClassificationIT.java:357)

    java.lang.RuntimeException: Had to resort to force-stopping jobs, something went wrong?
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.stopAnalyticsAndForceStopOnError(MlNativeDataFrameAnalyticsIntegTestCase.java:98)
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.cleanUpAnalytics(MlNativeDataFrameAnalyticsIntegTestCase.java:76)
        at org.elasticsearch.xpack.ml.integration.MlNativeDataFrameAnalyticsIntegTestCase.cleanUpResources(MlNativeDataFrameAnalyticsIntegTestCase.java:72)
        at org.elasticsearch.xpack.ml.integration.MlNativeIntegTestCase.cleanUp(MlNativeIntegTestCase.java:128)
        at org.elasticsearch.xpack.ml.integration.ClassificationIT.cleanup(ClassificationIT.java:79)

        Caused by:
        org.elasticsearch.ElasticsearchStatusException: cannot close data frame analytics [dependent_variable_of_type_alias] because it failed, use force stop instead