elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.83k forks source link

`change_point` agg fails with `IllegalArgumentException` while indexing an array #112805

Open weltenwort opened 1 month ago

weltenwort commented 1 month ago

Elasticsearch Version

8.16.0-SNAPSHOT

Installed Plugins

No response

Java Version

bundled

OS Version

Linux 6.5.0-1024-gcp #26~22.04.1-Ubuntu SMP Fri Jun 14 18:48:45 UTC 2024 x86_64 GNU/Linux

Problem Description

When run on some sets of documents the change_point aggregation throws an IllegalArgumentException. The Observability Logs UX team is trying to use the aggregation to detect change points in log documents. I was unable to detect a pattern to the failures, though, as slight modifications to the buckets cause it to disappear or re-appear:

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "rank-feature",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "0 > -1",
      "stack_trace": """java.lang.IllegalArgumentException: 0 > -1
    at java.base/java.util.Arrays.copyOfRange(Arrays.java:4090)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testDistributionChange(ChangePointAggregator.java:487)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:241)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:160)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.doReduce(ChangePointAggregator.java:129)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:235)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduceDelayable(InternalAggregations.java:211)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:682)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:636)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.RankFeaturePhase.innerRun(RankFeaturePhase.java:92)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.RankFeaturePhase$1.doRun(RankFeaturePhase.java:79)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:991)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)
"""
    },
    "stack_trace": """Failed to execute phase [rank-feature], 
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:726)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.RankFeaturePhase$1.onFailure(RankFeaturePhase.java:84)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:991)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.IllegalArgumentException: 0 > -1
    at java.base/java.util.Arrays.copyOfRange(Arrays.java:4090)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testDistributionChange(ChangePointAggregator.java:487)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:241)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.testForChange(ChangePointAggregator.java:160)
    at org.elasticsearch.ml@8.16.0-SNAPSHOT/org.elasticsearch.xpack.ml.aggs.changepoint.ChangePointAggregator.doReduce(ChangePointAggregator.java:129)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:235)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduceDelayable(InternalAggregations.java:211)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:682)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:636)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:139)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.RankFeaturePhase.innerRun(RankFeaturePhase.java:92)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.action.search.RankFeaturePhase$1.doRun(RankFeaturePhase.java:79)
    at org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    ... 6 more
"""
  },
  "status": 400
}

Steps to Reproduce

I originally encountered this when running on millions of log entries, but I could reduce it to this synthetic scenario:

DELETE change-point-test

POST _bulk
{ "index" : { "_index" : "change-point-test", "_id" : "1" } }
{ "key" : 1, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "2" } }
{ "key" : 2, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "3" } }
{ "key" : 3, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "4" } }
{ "key" : 4, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "5" } }
{ "key" : 5, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "6" } }
{ "key" : 6, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "7" } }
{ "key" : 7, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "8" } }
{ "key" : 8, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "9" } }
{ "key" : 9, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "10" } }
{ "key" : 10, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "11" } }
{ "key" : 11, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "12" } }
{ "key" : 12, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "13" } }
{ "key" : 13, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "14" } }
{ "key" : 14, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "15" } }
{ "key" : 15, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "16" } }
{ "key" : 16, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "17" } }
{ "key" : 17, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "18" } }
{ "key" : 18, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "19" } }
{ "key" : 19, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "20" } }
{ "key" : 20, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "21" } }
{ "key" : 21, "value" : 0 }
{ "index" : { "_index" : "change-point-test", "_id" : "22" } }
{ "key" : 22, "value" : 700 }
{ "index" : { "_index" : "change-point-test", "_id" : "23" } }
{ "key" : 23, "value" : 735 }
{ "index" : { "_index" : "change-point-test", "_id" : "24" } }
{ "key" : 24, "value" : 715 }
{ "index" : { "_index" : "change-point-test", "_id" : "25" } }
{ "key" : 25, "value" : 0 }

POST change-point-test/_search?error_trace
{
  "size": 0,
  "aggs": {
    "buckets": {
      "terms": {
        "field": "key",
        "size": 100
      },
      "aggs": {
        "values": {
          "max": {
            "field": "value",
            "missing": 0
          }
        }
      }
    },
    "change": {
      "change_point": {
        "buckets_path": "buckets>values"
      }
    }
  }
}

Logs (if relevant)

No response

weltenwort commented 1 month ago

Changing the values or adding/removing buckets can cause the problem to disappear or reappear. So it seems to be very dependent on the specific metric that the change point runs on.

elasticsearchmachine commented 1 month ago

Pinging @elastic/ml-core (Team:ML)