elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.56k stars 24.35k forks source link

[CI] MlDistributedFailureIT testFullClusterRestart failing #108757

Open cbuescher opened 1 month ago

cbuescher commented 1 month ago

Could be related to https://github.com/elastic/elasticsearch/issues/97574 but the error looks a bit different at first glance.

Build scan: https://gradle-enterprise.elastic.co/s/opcpchdbcpjqu/tests/:x-pack:plugin:ml:internalClusterTest/org.elasticsearch.xpack.ml.integration.MlDistributedFailureIT/testFullClusterRestart

Reproduction line:

./gradlew ':x-pack:plugin:ml:internalClusterTest' --tests "org.elasticsearch.xpack.ml.integration.MlDistributedFailureIT.testFullClusterRestart" -Dtests.seed=4F35CBC70E44FC4A -Dtests.locale=ar-JO -Dtests.timezone=Asia/Yekaterinburg -Druntime.java=17 -Dtests.fips.enabled=true

Applicable branches: 8.13

Reproduces locally?: No

Failure history: Failure dashboard for org.elasticsearch.xpack.ml.integration.MlDistributedFailureIT#testFullClusterRestart&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testFullClusterRestart'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.xpack.ml.integration.MlDistributedFailureIT'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))

Failure excerpt:

java.lang.AssertionError: 
Expected: an empty collection
     but: <[LEAK: resource was not cleaned up before it was garbage-collected.
Recent access records: 
Created at:
    org.elasticsearch.action.search.ArraySearchPhaseResults.<init>(ArraySearchPhaseResults.java:27)
    org.elasticsearch.action.search.QueryPhaseResultConsumer.<init>(QueryPhaseResultConsumer.java:85)
    org.elasticsearch.action.search.SearchPhaseController.newSearchPhaseResults(SearchPhaseController.java:813)
    org.elasticsearch.action.search.TransportSearchAction$AsyncSearchActionProvider.newSearchPhase(TransportSearchAction.java:1324)
    org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:1153)
    org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:914)
    org.elasticsearch.action.search.TransportSearchAction.lambda$executeRequest$8(TransportSearchAction.java:342)
    org.elasticsearch.action.ActionListenerImplementations$ResponseWrappingActionListener.onResponse(ActionListenerImplementations.java:245)
    org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:109)
    org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:77)
    org.elasticsearch.action.search.TransportSearchAction.executeRequest(TransportSearchAction.java:455)
    org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:309)
    org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:113)
    org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96)
    org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
    org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:93)
    org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68)
    org.elasticsearch.tasks.TaskManager.registerAndExecute(TaskManager.java:196)
    org.elasticsearch.client.internal.node.NodeClient.executeLocally(NodeClient.java:105)
    org.elasticsearch.client.internal.node.NodeClient.doExecute(NodeClient.java:83)
    org.elasticsearch.client.internal.support.AbstractClient.execute(AbstractClient.java:356)
    org.elasticsearch.client.internal.support.AbstractClient.search(AbstractClient.java:491)
    org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:221)
    org.elasticsearch.xpack.ml.job.persistence.JobConfigProvider.expandJobs(JobConfigProvider.java:585)
    org.elasticsearch.xpack.ml.job.JobManager.expandJobBuilders(JobManager.java:170)
    org.elasticsearch.xpack.ml.action.TransportGetJobsAction.masterOperation(TransportGetJobsAction.java:75)
    org.elasticsearch.xpack.ml.action.TransportGetJobsAction.masterOperation(TransportGetJobsAction.java:34)
    org.elasticsearch.action.support.master.TransportMasterNodeAction.executeMasterOperation(TransportMasterNodeAction.java:125)
    org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:236)
    org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
    org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:241)
    org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:236)
    org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:173)
    org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:55)
    org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96)
    org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
    org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:93)
    org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68)
    org.elasticsearch.tasks.TaskManager.registerAndExecute(TaskManager.java:196)
    org.elasticsearch.client.internal.node.NodeClient.executeLocally(NodeClient.java:105)
    org.elasticsearch.client.internal.node.NodeClient.doExecute(NodeClient.java:83)
    org.elasticsearch.client.internal.support.AbstractClient.execute(AbstractClient.java:356)
    org.elasticsearch.xpack.core.ClientHelper.lambda$executeAsyncWithOrigin$3(ClientHelper.java:236)
    org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:221)
    org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:236)
    org.elasticsearch.xpack.ml.job.task.OpenJobPersistentTasksExecutor$RevertToCurrentSnapshotAction.tryAction(OpenJobPersistentTasksExecutor.java:543)
    org.elasticsearch.action.support.RetryableAction$1.doRun(RetryableAction.java:111)
    org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
    org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    java.base/java.lang.Thread.run(Thread.java:833)]>

  at __randomizedtesting.SeedInfo.seed([4F35CBC70E44FC4A:1E1ADAD6FB68B4E5]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2119)
  at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:729)
  at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:520)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)
elasticsearchmachine commented 1 month ago

Pinging @elastic/ml-core (Team:ML)

pxsalehi commented 1 month ago

More failures: https://gradle-enterprise.elastic.co/s/2grtkssvn4x2g/tests/task/:x-pack:plugin:ml:internalClusterTest/details/org.elasticsearch.xpack.ml.integration.MlDistributedFailureIT/testFullClusterRestart?top-execution=1

elasticsearchmachine commented 1 month ago

This has been muted on branch 8.13.

Mute Reasons:

Build Scans: