elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.17k stars 24.84k forks source link

[CI] SearchWithRandomIOExceptionsIT testRandomDirectoryIOExceptions failing #114824

Open elasticsearchmachine opened 4 weeks ago

elasticsearchmachine commented 4 weeks ago

Build Scans:

Reproduction Line:

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.basic.SearchWithRandomIOExceptionsIT.testRandomDirectoryIOExceptions" -Dtests.seed=F8F0E5FB8684B939 -Dtests.locale=ee -Dtests.timezone=America/Tortola -Druntime.java=23

Applicable branches: main

Reproduces locally?: N/A

Failure History: See dashboard&_a=(controlGroupState:(initialChildControlState:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,fieldName:task.keyword,order:0,selectedOptions:!(),title:'GradleTask',type:optionsListControl),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,fieldName:className.keyword,order:1,selectedOptions:!(org.elasticsearch.search.basic.SearchWithRandomIOExceptionsIT),title:'Suite',type:optionsListControl),'144933da-5c1b-4257-a969-7f43455a7901':(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,fieldName:name.keyword,order:2,selectedOptions:!(testRandomDirectoryIOExceptions),title:'Test',type:optionsListControl)))))

Failure Message:

java.lang.AssertionError: All incoming requests on node [node_s4] should have finished. Expected 0 bytes for requests in-flight but got 78 bytes; pending tasks [[]]

Issue Reasons:

Note: This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

elasticsearchmachine commented 4 weeks ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 2 weeks ago

Its been failing since April or before. All the failures are effectively the same.

Expected 0 bytes for requests in-flight but got N bytes; where N is some non zero number near 100.

Some other full stacks:


2> java.lang.AssertionError: All incoming requests on node [node_s1] should have finished. Expected 0 bytes for requests in-flight but got 99 bytes; pending tasks [[]] |  
-- | --
  | at org.elasticsearch.test.InternalTestCluster.lambda$assertRequestsFinished$37(InternalTestCluster.java:2548) |  
  | at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1429) |  
  | at org.elasticsearch.test.InternalTestCluster.assertRequestsFinished(InternalTestCluster.java:2539) |  
  | at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2512) |  
  | at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:611) |  
  | at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2420) |  
  | at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) |  
  | at java.base/java.lang.reflect.Method.invoke(Method.java:580) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) |  
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) |  
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) |  
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) |  
  | at java.base/java.lang.Thread.run(Thread.java:1570) |  
  |   |  
  | java.lang.AssertionError: searcher was not warmed up yet for source[field_has_value] |  
  | at __randomizedtesting.SeedInfo.seed([E1A2C0470C9B7EFC]:0) |  
  | at org.elasticsearch.index.engine.InternalEngine.assertSearcherIsWarmedUp(InternalEngine.java:523) |  
  | at org.elasticsearch.index.engine.Engine$1.acquireSearcherInternal(Engine.java:831) |  
  | at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1484) |  
  | at org.elasticsearch.index.engine.Engine.acquireSearcher(Engine.java:882) |  
  | at org.elasticsearch.index.engine.Engine.acquireSearcher(Engine.java:875) |  
  | at org.elasticsearch.index.engine.Engine.acquireSearcher(Engine.java:871) |  
  | at org.elasticsearch.index.shard.IndexShard$RefreshFieldHasValueListener.afterRefresh(IndexShard.java:4084) |  
  | at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) |  
  | at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) |  
  | at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) |  
  | at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:2056) |  
  | at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:2027) |  
  | at org.elasticsearch.index.engine.Engine.lambda$externalRefresh$11(Engine.java:1252) |  
  | at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:356) |  
  | at org.elasticsearch.index.engine.Engine.externalRefresh(Engine.java:1250) |  
  | at org.elasticsearch.index.shard.IndexShard.externalRefresh(IndexShard.java:1296) |  
  | at org.elasticsearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnReplica(TransportShardRefreshAction.java:103) |  
  | at org.elasticsearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnReplica(TransportShardRefreshAction.java:38) |  
  | at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:676) |  
  | at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:647) |  
  | at org.elasticsearch.index.shard.IndexShard.lambda$innerAcquireReplicaOperationPermit$44(IndexShard.java:3788) |  
  | at org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:219) |  
  | at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400) |  
  | at org.elasticsearch.index.shard.IndexShardOperationPermits.innerAcquire(IndexShardOperationPermits.java:255) |  
  | at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:203) |  
  | at org.elasticsearch.index.shard.IndexShard.lambda$acquireReplicaOperationPermit$42(IndexShard.java:3718) |  
  | at org.elasticsearch.index.shard.IndexShard.innerAcquireReplicaOperationPermit(IndexShard.java:3830) |  
  | at org.elasticsearch.index.shard.IndexShard.acquireReplicaOperationPermit(IndexShard.java:3712) |  
  | at org.elasticsearch.action.support.replication.TransportReplicationAction.acquireReplicaOperationPermit(TransportReplicationAction.java:1120) |  
  | at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:771) |  
  | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
  | at org.elasticsearch.action.support.replication.TransportReplicationAction.handleReplicaRequest(TransportReplicationAction.java:625) |  
  | at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:76) |  
  | at org.elasticsearch.transport.InboundHandler.doHandleRequest(InboundHandler.java:289) |  
  | at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:302) |  
  | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) |  
  | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) |  
  | at java.base/java.lang.Thread.run(Thread.java:1570)
benwtrent commented 2 weeks ago

The most common thing we waiting on is the health_check persistent task or empty (I think the empty is a race condition, where a task finished between us checking and then verifying in flight tasks).

elasticsearchmachine commented 6 days ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine commented 2 days ago

This has been muted on branch main

Mute Reasons:

Build Scans:

benwtrent commented 2 days ago

The latest failures

2> REPRODUCE WITH: ./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.basic.SearchWithRandomIOExceptionsIT.testRandomDirectoryIOExceptions" -Dtests.seed=F8F0E5FB8684B939 -Dtests.locale=ee -Dtests.timezone=America/Tortola -Druntime.java=23 |  
-- | --
  | 2> REPRODUCE WITH: ./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.basic.SearchWithRandomIOExceptionsIT.testRandomDirectoryIOExceptions" -Dtests.seed=F8F0E5FB8684B939 -Dtests.locale=ee -Dtests.timezone=America/Tortola -Druntime.java=23 |  
  | 2> java.lang.AssertionError: All incoming requests on node [node_s4] should have finished. Expected 0 bytes for requests in-flight but got 78 bytes; pending tasks [[]] |  
  | at org.elasticsearch.test.InternalTestCluster.lambda$assertRequestsFinished$37(InternalTestCluster.java:2548) |  
  | at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1439) |  
  | at org.elasticsearch.test.InternalTestCluster.assertRequestsFinished(InternalTestCluster.java:2539) |  
  | at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2512) |  
  | at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:611) |  
  | at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2422) |  
  | at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) |  
  | at java.base/java.lang.reflect.Method.invoke(Method.java:580) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) |  
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) |  
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) |  
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) |  
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) |  
  | at java.base/java.lang.Thread.run(Thread.java:1575) |  
  |   |  
  | java.lang.AssertionError: searcher was not warmed up yet for source[field_has_value] |  
  | at __randomizedtesting.SeedInfo.seed([F8F0E5FB8684B939]:0) |  
  | at org.elasticsearch.index.engine.InternalEngine.assertSearcherIsWarmedUp(InternalEngine.java:523) |  
  | at org.elasticsearch.index.engine.Engine$1.acquireSearcherInternal(Engine.java:828) |  
  | at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1481) |  
  | at org.elasticsearch.index.engine.Engine.acquireSearcher(Engine.java:879) |  
  | at org.elasticsearch.index.engine.Engine.acquireSearcher(Engine.java:872) |  
  | at org.elasticsearch.index.engine.Engine.acquireSearcher(Engine.java:868) |  
  | at org.elasticsearch.index.shard.IndexShard$RefreshFieldHasValueListener.afterRefresh(IndexShard.java:4083) |  
  | at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275) |  
  | at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182) |  
  | at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) |  
  | at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:2056) |  
  | at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:2027) |  
  | at org.elasticsearch.index.shard.IndexShard.finalizeRecovery(IndexShard.java:2309) |  
  | at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$finalizeRecovery$2(RecoveryTarget.java:401) |  
  | at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:356) |  
  | at org.elasticsearch.indices.recovery.RecoveryTarget.finalizeRecovery(RecoveryTarget.java:382) |  
  | at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$5.handleRequest(PeerRecoveryTargetService.java:214) |  
  | at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$5.handleRequest(PeerRecoveryTargetService.java:207) |  
  | at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:661) |  
  | at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:648) |  
  | at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:76) |  
  | at org.elasticsearch.transport.InboundHandler.doHandleRequest(InboundHandler.java:289) |  
  | at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:302) |  
  | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) |  
  | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) |  
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) |  
  | at java.base/java.lang.Thread.run(Thread.java:1575)