elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.59k stars 24.63k forks source link

[CI] RejectedExecutionTests testHistoryOnRejection failing #105951

Closed benwtrent closed 6 months ago

benwtrent commented 6 months ago

I can see in the logs that a rejection did occur. However, the history index seems to ALSO have rejected the write?

Build scan: https://gradle-enterprise.elastic.co/s/xjimegducgx5i/tests/:x-pack:plugin:watcher:internalClusterTest/org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests/testHistoryOnRejection

Reproduction line:

./gradlew ':x-pack:plugin:watcher:internalClusterTest' --tests "org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests.testHistoryOnRejection" -Dtests.seed=A8044DC7AE7C8A85 -Dtests.locale=ar-KW -Dtests.timezone=UTC -Druntime.java=21

Applicable branches: 8.13

Reproduces locally?: No

Failure history: Failure dashboard for org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests#testHistoryOnRejection&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testHistoryOnRejection'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))

Failure excerpt:

java.lang.AssertionError: 
Expected: a value equal to or greater than <2L>
     but: <0L> was less than <2L>

  at __randomizedtesting.SeedInfo.seed([A8044DC7AE7C8A85:D70BE3C42F9401C6]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2119)
  at org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests.lambda$testHistoryOnRejection$0(RejectedExecutionTests.java:55)
  at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertResponse(ElasticsearchAssertions.java:375)
  at org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests.lambda$testHistoryOnRejection$1(RejectedExecutionTests.java:53)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1255)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1228)
  at org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests.testHistoryOnRejection(RejectedExecutionTests.java:51)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 6 months ago

Pinging @elastic/es-data-management (Team:Data Management)

masseyke commented 6 months ago

I don't think I've seen this before. The problem is that the write to watcher history was rejected:

  1> [2024-03-05T11:59:23,221][ERROR][o.e.x.w.Watcher          ] [node_s0] error executing bulk
  1> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of TimedRunnable{original=org.elasticsearch.action.bulk.TransportBulkAction$2/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.tasks.TaskManager$1{org.elasticsearch.action.support.ContextPreservingActionListener/org.elasticsearch.action.bulk.Retry2$RetryHandler@6ac12a57}{Task{id=246, type='transport', action='indices:data/write/bulk', description='requests[1], indices[.watcher-history-16]', parentTask=unset, startTime=1709639963220, startTimeNanos=989696122872}}/org.elasticsearch.action.support.TransportAction$$Lambda/0x00007f8e84aa7b30@601084c9/org.elasticsearch.action.bulk.TransportBulkAction$$Lambda/0x00007f8e84e10000@7877bfed, creationTimeNanos=989696235664, startTimeNanos=0, finishTimeNanos=-1, failedOrRejected=false} on TaskExecutionTimeTrackingEsThreadPoolExecutor[name = node_s0/write, queue capacity = 1, task execution EWMA = 10.7ms, total task execution time = 190ms, org.elasticsearch.common.util.concurrent.TaskExecutionTimeTrackingEsThreadPoolExecutor@7ed8b40b[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 11]]
  1>    at org.elasticsearch.common.util.concurrent.EsRejectedExecutionHandler.newRejectedException(EsRejectedExecutionHandler.java:51) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:35) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:841) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1376) ~[?:?]
  1>    at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:72) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.action.bulk.TransportBulkAction.forkAndExecute(TransportBulkAction.java:294) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.action.bulk.TransportBulkAction.ensureClusterStateThenForkAndExecute(TransportBulkAction.java:289) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:248) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:84) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
  1>    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
...

I'm not sure yet why that would be though.

masseyke commented 6 months ago

The watcher history was rejected because we set the write queue size to 1, and I assume something else was in there. I don't think that this is the point of the test (it's desired behavior that we immediately reject writes if the write queue is full).

mark-vieira commented 6 months ago

I observed this error after the fix from #106134 was merged: https://gradle-enterprise.elastic.co/s/y6wmgil4wjfda/tests/task/:x-pack:plugin:watcher:internalClusterTest/details/org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests/testHistoryOnRejection?top-execution=1

mark-vieira commented 6 months ago

I've remuted this test since the above failure reproduces for me.