elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.61k stars 24.63k forks source link

Some ML Test are failing with "Accounting breaker not reset to" errors #55420

Closed imotov closed 4 years ago

imotov commented 4 years ago

A couple of latest examples:

The error looks like this:


org.elasticsearch.xpack.ml.integration.ClassificationEvaluationIT > testEvaluate_Recall_CardinalityTooHigh FAILED |  
-- | --
  | java.lang.AssertionError: Fielddata breaker not reset to 0 on node: {integTest-0}{M747rOuTSMqmMD6YA_WfEg}{3ej5FYKXTt6fWGrPL2LjoA}{127.0.0.1}{127.0.0.1:34535}{dilmrt}{testattr=test, ml.machine_memory=101370638336, ml.max_open_jobs=20, xpack.installed=true, transform.node=true} |  
  | Expected: <0L> |  
  | but: was <1888L> |  
  | at __randomizedtesting.SeedInfo.seed([520EF9CC9245A36:6DE1D1D748961E23]:0) |  
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  | at org.junit.Assert.assertThat(Assert.java:956) |  
  | at org.elasticsearch.test.ExternalTestCluster.ensureEstimatedStats(ExternalTestCluster.java:194) |  
  | at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:90) |  
  | at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:559) |  
  | at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2039) |  
  | at jdk.internal.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) |  
  | at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) |  
  | at java.base/java.lang.reflect.Method.invoke(Method.java:566) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) |  
  | at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) |  
  | at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) |  
  | at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) |  
  | at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) |  
  | at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) |  
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) |  
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) |  
  | at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) |  
  | at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) |  
  | at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831) |  
  | at java.base/java.lang.Thread.run(Thread.java:834)

One common part, is that logs are filled with the following errors:

1> [2020-04-17T12:35:18,158][WARN ][o.e.x.m.i.i.I.Factory    ] [external_22] failure parsing pipeline config [xpack_monitoring_6] |  
  | 1> org.elasticsearch.ElasticsearchParseException: No processor type exists with name [script] |  
  | 1>  at org.elasticsearch.ingest.ConfigurationUtils.newConfigurationException(ConfigurationUtils.java:315) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:444) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:398) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:336) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.ingest.Pipeline.create(Pipeline.java:74) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor$Factory.accept(InferenceProcessor.java:209) [x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor$Factory.accept(InferenceProcessor.java:166) [x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.ingest.IngestService.lambda$applyClusterState$5(IngestService.java:542) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:803) [?:?] |  
  | 1>  at org.elasticsearch.ingest.IngestService.applyClusterState(IngestService.java:542) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:511) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at java.lang.Iterable.forEach(Iterable.java:75) [?:?] |  
  | 1>  at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:508) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:632) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] |  
  | 1>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] |  
  | 1>  at java.lang.Thread.run(Thread.java:834) [?:?] |  
  | 1>  Suppressed: org.elasticsearch.ElasticsearchParseException: No processor type exists with name [gsub] |  
  | 1>      at org.elasticsearch.ingest.ConfigurationUtils.newConfigurationException(ConfigurationUtils.java:315) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:444) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.ingest.ConfigurationUtils.readProcessor(ConfigurationUtils.java:398) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.ingest.ConfigurationUtils.readProcessorConfigs(ConfigurationUtils.java:336) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.ingest.Pipeline.create(Pipeline.java:74) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor$Factory.accept(InferenceProcessor.java:209) [x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor$Factory.accept(InferenceProcessor.java:166) [x-pack-ml-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.ingest.IngestService.lambda$applyClusterState$5(IngestService.java:542) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:803) [?:?] |  
  | 1>      at org.elasticsearch.ingest.IngestService.applyClusterState(IngestService.java:542) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:511) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at java.lang.Iterable.forEach(Iterable.java:75) [?:?] |  
  | 1>      at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:508) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:632) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] |  
  | 1>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] |  
  | 1>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] |  
  | 1>      at java.lang.Thread.run(Thread.java:834) [?:?] |  

@benwtrent pointed out that it could be caused by #54816 but we still need to verify that. These messages might be as well a red herring.  

elasticmachine commented 4 years ago

Pinging @elastic/ml-core (:ml)

benwtrent commented 4 years ago

@williamrandolph ping

Just verified locally that #54816 is causing the log spam. It MIGHT also be causing the intermittent failure.

I have noticed in the past some weird behavior withFielddata breaker checks and ingest pipelines. Node stats and memory allocations differ slightly with pipelines. I really only noticed this when pipelines were simulated in the test. I am not sure if the basic plugins being enabled are causing this intermittent behavior. My prime suspect is monitoring, but I am not sure.

History:

The IngestPipeline.Factory is loaded up on ML nodes. It attempts to parse pipelines out and check if they are inference pipelines. If they are, it keeps a local tally of them. This is so we can have a setting that puts a sane upper limit on these processors.

I think to fix this, we could enable all these ingest plugins/modules for the ML integration tests. That will get rid of the log spam at least.

williamrandolph commented 4 years ago

@benwtrent Apologies for the log spam. I will try to fix it next week.

I think to fix this, we could enable all these ingest plugins/modules for the ML integration tests. That will get rid of the log spam at least.

I'm not sure which plugins and modules are not already enabled. Do we need to configure the ML nodes to be ingest nodes?

benwtrent commented 4 years ago

I am not sure off hand why the script processor is loaded. But I think making sure it and the other processors referenced by the monitor plugin are loaded in the ML test cluster.

I don't know how to do this, but I can also take a look next week.

williamrandolph commented 4 years ago

Ah, I just noticed that xpack.monitoring.elasticsearch.collection.enabled has a default value of true. I could try setting this to false for all the tests that formerly had xpack.monitoring.enabled set to false.

droberts195 commented 4 years ago

We have had this "Accounting breaker not reset" error before in integration tests and the cause was that indices were being recreated simultaneously with the cleanup method that deletes all the indices in between tests.

I had a look at the server side logs for one of these failures, https://gradle-enterprise.elastic.co/s/r7ybldwftir2s/tests/zcquf3hc3eoc6-7mybzxk4lp3n2, to see which index it was that was created during cleanup.

In that test run the failing test was CategorizationIT.testCategorizationWithFilters and it finished at 2020-04-19T00:53:15,933:

[2020-04-19T00:53:15,933][INFO ][o.e.x.m.i.CategorizationIT] [testCategorizationWithFilters] after test

Looking at the server side logs for what was going on with indices around that time we see this:

[2020-04-18T19:53:15,704][INFO ][o.e.x.i.IndexLifecycleTransition] [integTest-2] moving index [ilm-history-2-000001] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] in policy [ilm-history-ilm-policy]
[2020-04-18T19:53:15,819][INFO ][o.e.x.i.IndexLifecycleTransition] [integTest-2] moving index [ilm-history-2-000001] from [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-follow-shard-tasks"}] in policy [ilm-history-ilm-policy]

The timezone for those messages is 5 hours behind the client timezone, so in reality they're just a about a 100-200ms before the test finished. The accounting breaker assertion tripped on node 1 whereas those messages are from node 2, so it's reasonable to think that node 1 would gain visibility of the creation of the ILM history index at the time the assertion tripped during the post test cleanup.

Async creation of internal indices has always been a problem for integration tests that try to remove all indices in between tests. Something similar is happening for templates in the same server-side log file:

[2020-04-18T19:53:14,242][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-anomalies-]
[2020-04-18T19:53:14,394][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-anomalies-] for index patterns [.ml-anomalies-*]
[2020-04-18T19:53:14,453][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-notifications-000001]
[2020-04-18T19:53:14,544][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-notifications-000001] for index patterns [.ml-notifications-000001]
[2020-04-18T19:53:14,598][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-inference-000002]
[2020-04-18T19:53:14,683][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-inference-000002] for index patterns [.ml-inference-000002]
[2020-04-18T19:53:14,736][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-state]
[2020-04-18T19:53:14,822][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-state] for index patterns [.ml-state*] 
[2020-04-18T19:53:14,875][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-meta]
[2020-04-18T19:53:14,958][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-meta] for index patterns [.ml-meta] 
[2020-04-18T19:53:15,014][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [random_index_template]
[2020-04-18T19:53:15,068][INFO ][o.e.x.i.IndexLifecycleTransition] [integTest-2] moving index [ilm-history-2-000001] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [ilm-history-ilm-policy]
[2020-04-18T19:53:15,122][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-stats]
[2020-04-18T19:53:15,227][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-stats] for index patterns [.ml-stats-*]
[2020-04-18T19:53:15,288][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.ml-config]
[2020-04-18T19:53:15,370][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.ml-config] for index patterns [.ml-config]
[2020-04-18T19:53:15,426][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [.slm-history]
[2020-04-18T19:53:15,511][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [.slm-history] for index patterns [.slm-history-2*]
[2020-04-18T19:53:15,565][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] removing template [ilm-history]
[2020-04-18T19:53:15,647][INFO ][o.e.c.m.MetadataIndexTemplateService] [integTest-2] adding template [ilm-history] for index patterns [ilm-history-2*]

In between the tests the post test cleanup in the client removes all the templates and then the template auto-creation functionality in the server fights back immediately to recreate the same templates. 🤦

So I would guess that the reason #54816 has aggravated this is that some component that uses ILM is now enabled by default (it's not ILM itself as that was always enabled by default). I cannot work out what, as monitoring still seems to be disabled in the ML native multi-node tests, unless something overrides https://github.com/elastic/elasticsearch/blob/92c8a73348055e5784fc82a1c5e1d346f3167b1a/x-pack/plugin/ml/qa/native-multi-node-tests/src/test/java/org/elasticsearch/xpack/ml/integration/MlNativeIntegTestCase.java#L135

But when monitoring eventually is enabled, that will make the problem of test cleanup fighting async actions even worse. Therefore I think we need a new approach to cleanup in between the ML native multi-node integration tests: we should only wipe ML indices and data indices. We should leave the internal indices of other ES functionality like ILM, monitoring and security alone between tests. Also, there's no need for us to remove the ML index templates between the native multi-node tests, because we're not testing upgrade, so the templates will be the same for all the tests.

Unfortunately this change to cleanup logic will need to be done in two places, because the native multi-node tests are a mixture of REST tests and Java client tests - see #49582. So the (completely different) cleanup code used for both these types of ML native multi-node tests will need to be changed to be consistent 😬.

The problem with the script plugin not being available generating lots of log spam since #54816 still needs to be fixed, but that is separate to the problem of cleanup in between tests.

hendrikmuhs commented 4 years ago

another one: https://gradle-enterprise.elastic.co/s/xhmbqjq2psh5c

(This time ScheduledEventsIT.testScheduledEventWithInterimResults)

williamrandolph commented 4 years ago

I've opened a PR that I hope will at least get rid of the log spam in the ML multinode tests: https://github.com/elastic/elasticsearch/pull/55461

jrodewig commented 4 years ago

Related build failure: https://gradle-enterprise.elastic.co/s/cyx2laofbzs72/tests/zcquf3hc3eoc6-iu7t2hi7meoko

droberts195 commented 4 years ago

Hopefully #55459 will fix this. Given how frequently it was failing before we should know within a half a day or so.

droberts195 commented 4 years ago

I haven't seen any failures of this type since #55459 passed the intake build, so it looks like it's fixed.