elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.92k stars 24.73k forks source link

[CI] FollowerFailOverIT testFailOverOnFollower failing #88442

Closed matschaffer closed 1 year ago

matschaffer commented 2 years ago

Build scan: https://gradle-enterprise.elastic.co/s/vgwcstaplwiom/tests/:x-pack:plugin:ccr:internalClusterTest/org.elasticsearch.xpack.ccr.FollowerFailOverIT/testFailOverOnFollower

Reproduction line: ./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.FollowerFailOverIT.testFailOverOnFollower" -Dtests.seed=30D274C2E539EEB2 -Dtests.locale=sr-RS -Dtests.timezone=Atlantic/Azores -Druntime.java=17

Applicable branches: master

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.ccr.FollowerFailOverIT&tests.test=testFailOverOnFollower

Failure excerpt:

java.lang.AssertionError: All incoming requests on node [follower3] should have finished. Expected 0 but got 67; pending tasks [[]]

  at org.elasticsearch.test.InternalTestCluster.lambda$assertRequestsFinished$42(InternalTestCluster.java:2518)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1098)
  at org.elasticsearch.test.InternalTestCluster.assertRequestsFinished(InternalTestCluster.java:2509)
  at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2483)
  at org.elasticsearch.xpack.CcrIntegTestCase.afterTest(CcrIntegTestCase.java:291)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:833)
matschaffer commented 2 years ago

This test passed once for me

> Task :x-pack:plugin:ccr:internalClusterTest
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/Users/matschaffer/.gradle/wrapper/dists/gradle-7.4.2-all/9uukhhbclvbegdvsww0j0cr3p/gradle-7.4.2/lib/plugins/gradle-testing-base-7.4.2.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release

BUILD SUCCESSFUL in 5m 18s
111 actionable tasks: 88 executed, 23 up-to-date

but after the first run gradle seems to return a cached result:

❯ ./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.FollowerFailOverIT.testFailOverOnFollower" -Dtests.seed=30D274C2E539EEB2 -Dtests.locale=sr-RS -Dtests.timezone=Atlantic/Azores -Druntime.java=17
=======================================
Elasticsearch Build Hamster says Hello!
  Gradle Version        : 7.4.2
  OS Info               : Mac OS X 12.4 (x86_64)
  JDK Version           : 17 (Oracle)
  JAVA_HOME             : /Users/matschaffer/.asdf/installs/java/oracle-17
  Random Testing Seed   : 30D274C2E539EEB2
  In FIPS 140 mode      : false
=======================================

BUILD SUCCESSFUL in 2s
111 actionable tasks: 111 up-to-date
elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)

matschaffer commented 2 years ago

Thanks @DaveCTurner ! I'll remove my assignment since this appears to be a case of a flakey test I just happened to bump into.

@elastic/es-distributed if there's any context I can provide just let me know.

Tim-Brooks commented 1 year ago

We have fixed a number of finicky issues with this test case of the last few months. This particular failure has not reproduce or occurred for a while. I opened https://github.com/elastic/elasticsearch/pull/93400 to track a misleading log associated with this test. And am now closing since we have resolved a number of issues with this test case.