elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.43k stars 24.57k forks source link

[CI] SearchCancellationIT testCancelFailedSearchWhenPartialResultDisallowed failing #99929

Open DiannaHohensee opened 11 months ago

DiannaHohensee commented 11 months ago

Build scan: https://gradle-enterprise.elastic.co/s/pf6k6xm4cpioi/tests/:server:internalClusterTest/org.elasticsearch.search.SearchCancellationIT/testCancelFailedSearchWhenPartialResultDisallowed

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.SearchCancellationIT.testCancelFailedSearchWhenPartialResultDisallowed" -Dtests.seed=81676913A79F3181 -Dtests.locale=ms-MY -Dtests.timezone=America/Lower_Princes -Druntime.java=21

Applicable branches: main

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.search.SearchCancellationIT&tests.test=testCancelFailedSearchWhenPartialResultDisallowed

Failure excerpt:

java.lang.AssertionError: The Coordinator should have one SearchTask.
Expected: a collection with size <1>
     but: collection size was <0>

  at __randomizedtesting.SeedInfo.seed([81676913A79F3181:CB7B841D17DCD6B9]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.elasticsearch.search.SearchCancellationIT.lambda$testCancelFailedSearchWhenPartialResultDisallowed$4(SearchCancellationIT.java:282)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1196)
  at org.elasticsearch.search.SearchCancellationIT.testCancelFailedSearchWhenPartialResultDisallowed(SearchCancellationIT.java:280)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 11 months ago

Pinging @elastic/es-search (Team:Search)

mark-vieira commented 11 months ago

We have sporadic failures in this test suite that happen a couple times a day so I've muted this in the interim.

benwtrent commented 11 months ago

Marked as blocker as the entire suite: SearchCancellationIT is muted.

benwtrent commented 11 months ago

OK, all the failures started occurring after: https://github.com/elastic/elasticsearch/pull/99689

This is because of concurrent segment search. All these cancellation tests need to rewritten or removed. In the interim, I am going to turn off concurrent search & unmute the tests so we get our coverage back to reduce impact.

benwtrent commented 11 months ago

unmuting the suite: https://github.com/elastic/elasticsearch/pull/100840

will close this issue once that is merged, if the same test fails again, at least we will have trace logging this time :D.

astefan commented 10 months ago

There is a failure today: https://gradle-enterprise.elastic.co/s/f5csspdznxrxg But I am not sure if the test failed because of the suite timing out, I think so.

astefan commented 10 months ago

Another one: https://gradle-enterprise.elastic.co/s/piu4w3mj3pxtu

benwtrent commented 10 months ago

Both almost seem like suite timeouts. As they can be explained by an interrupt cancelling the internal waiting threads.

There are some weird things happening in the logs

piu4w3mj3pxtu-console-log.txt.zip

original-brownbear commented 10 months ago

This will be resolved by https://github.com/elastic/elasticsearch/pull/101777 as well I think

masseyke commented 10 months ago

I think I just hit this in a PR build, and I had #101777 in my branch -- https://buildkite.com/elastic/elasticsearch-pull-request/builds/252#018baa9b-a574-41c9-8ff1-789354bda35f

williamrandolph commented 10 months ago

We're going to see this on the 8.11 branch until the fix is backported: https://gradle-enterprise.elastic.co/s/uxcfkx6bwcuxu

volodk85 commented 9 months ago

Reopening due to recent failure: https://gradle-enterprise.elastic.co/s/wjmkzdeokf6ra

thecoop commented 5 months ago

Another one on 8.12: https://gradle-enterprise.elastic.co/s/rlmhdf3txdeu2, muting on 8.12

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)