elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.25k stars 24.85k forks source link

[CI] FullClusterRestartIT testWatcherWithApiKey failing #84700

Open valeriy42 opened 2 years ago

valeriy42 commented 2 years ago

Seems to be failing regularly with the same error recently.

Build scan: https://gradle-enterprise.elastic.co/s/b3aibsuqwx4ww/tests/:x-pack:qa:full-cluster-restart:v7.2.0%23oldClusterTest/org.elasticsearch.xpack.restart.FullClusterRestartIT/testWatcherWithApiKey

Reproduction line: ./gradlew ':x-pack:qa:full-cluster-restart:v7.2.0#oldClusterTest' -Dtests.class="org.elasticsearch.xpack.restart.FullClusterRestartIT" -Dtests.method="testWatcherWithApiKey" -Dtests.seed=89F33062A7870C11 -Dtests.bwc=true -Dtests.locale=ar-KW -Dtests.timezone=Europe/Brussels -Druntime.java=17

Applicable branches: 8.0

Reproduces locally?: Didn't try

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.restart.FullClusterRestartIT&tests.test=testWatcherWithApiKey

Failure excerpt:

java.lang.AssertionError: expected:<executed> but was:<null>

  at __randomizedtesting.SeedInfo.seed([89F33062A7870C11:60A8FA1458B41D38]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.junit.Assert.assertEquals(Assert.java:144)
  at org.elasticsearch.xpack.restart.FullClusterRestartIT.lambda$testWatcherWithApiKey$0(FullClusterRestartIT.java:248)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1069)
  at org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcherWithApiKey(FullClusterRestartIT.java:245)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:833)
elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

masseyke commented 2 years ago

It looks like this one is a timeout from the stack trace above, and from this in the logs:

  1> [2022-03-07T10:22:07,024][INFO ][o.e.x.r.FullClusterRestartIT] [testWatcherWithApiKey] before test
  1> [2022-03-07T10:22:07,027][INFO ][o.e.x.r.FullClusterRestartIT] [testWatcherWithApiKey] initializing REST clients against [http://[::1]:33333, http://127.0.0.1:45153, http://[::1]:45317, http://127.0.0.1:34561]
  1> [2022-03-07T10:22:41,230][INFO ][o.e.x.r.FullClusterRestartIT] [testWatcherWithApiKey] after test

Running it locally, it only takes 4-5 seconds. So I'm not sure why it's so much slower sometimes on the CI server.

pugnascotia commented 2 years ago

Another failure: https://gradle-enterprise.elastic.co/s/fa5zfc5on4v5w

n1v0lg commented 2 years ago

Another one https://gradle-enterprise.elastic.co/s/u7flo66by74b2

luigidellaquila commented 2 years ago

Still failing: https://gradle-enterprise.elastic.co/s/mfofju4qvgkqg

idegtiarenko commented 2 years ago

One more failure: https://gradle-enterprise.elastic.co/s/zj3onfpgzyjyi

idegtiarenko commented 2 years ago

And one more: https://gradle-enterprise.elastic.co/s/t4f5rxsye47qy

pxsalehi commented 1 year ago

A similar failure but with

./gradlew ':x-pack:qa:full-cluster-restart:v7.3.2#oldClusterTest' -Dtests.class="org.elasticsearch.xpack.restart.FullClusterRestartIT" -Dtests.method="testWatcherWithApiKey" -Dtests.seed=D916BB57EB898DDF -Dtests.bwc=true -Dtests.locale=es-PE -Dtests.timezone=Asia/Vientiane -Druntime.java=17

https://gradle-enterprise.elastic.co/s/prx5cfgu3pv2e/tests/:x-pack:qa:full-cluster-restart:v7.3.2%23oldClusterTest/org.elasticsearch.xpack.restart.FullClusterRestartIT/testWatcherWithApiKey

valeriy42 commented 1 year ago

The test is still failing:

kingherc commented 1 year ago

Still failing. https://gradle-enterprise.elastic.co/s/lfiq7c4dxtzkw/tests/:x-pack:qa:full-cluster-restart:v6.8.9%23bwcTest/org.elasticsearch.xpack.restart.FullClusterRestartIT/testWatcherWithApiKey%20%7Bcluster=OLD%7D?top-execution=1

Will mute.

kingherc commented 1 year ago

Another one at https://gradle-enterprise.elastic.co/s/xqrve2db4nxiq/console-log?task=:x-pack:qa:full-cluster-restart:v7.17.10%23bwcTest :

:x-pack:qa:full-cluster-restart:v7.17.10#bwcTest FAILED 
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8   
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8   
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8   
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8   
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 -Dsun.jnu.encoding=UTF8   
org.elasticsearch.xpack.restart.FullClusterRestartIT > testWatcherWithApiKey {cluster=UPGRADED} FAILED  
    java.lang.AssertionError: version increased: [true], executed: [false]  
    Expected: is <true> 
         but: was <false>   
        at __randomizedtesting.SeedInfo.seed([D08E51FCB5D10E4C:39D59B8A4AE21F65]:0) 
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) 
        at org.junit.Assert.assertThat(Assert.java:956) 
        at org.elasticsearch.xpack.restart.FullClusterRestartIT.lambda$testWatcherWithApiKey$1(FullClusterRestartIT.java:279)   
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1112)   
        at org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcherWithApiKey(FullClusterRestartIT.java:270)    
Tests with failures:    
 - org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcherWithApiKey {cluster=UPGRADED}    
78 tests completed, 1 failed, 14 skipped
davidkyle commented 1 year ago

Slightly different error to the headline above

java.lang.AssertionError: version increased: [true], executed: [false] 
Expected: is <true> 
but: was <false>

at org.elasticsearch.xpack.restart.FullClusterRestartIT.lambda$testWatcherWithApiKey$1(FullClusterRestartIT.java:279)
at org.elasticsearch.xpack.restart.FullClusterRestartIT.testWatcherWithApiKey(FullClusterRestartIT.java:270)

https://gradle-enterprise.elastic.co/s/efiml4ickhj3c/tests/:x-pack:qa:full-cluster-restart:v7.17.10%23bwcTest/org.elasticsearch.xpack.restart.FullClusterRestartIT/testWatcherWithApiKey%20%7Bcluster=UPGRADED%7D?top-execution=1

masseyke commented 7 months ago

Maybe we're running into https://github.com/elastic/elasticsearch/issues/69842? I can't find any evidence of it, but it does seem that watcher is just not running for some reason.

elasticsearchmachine commented 1 week ago

This issue has been closed because it has been open for too long with no activity.

Any muted tests that were associated with this issue have been unmuted.

If the tests begin failing again, a new issue will be opened, and they may be muted again.

elasticsearchmachine commented 1 week ago

This issue is getting re-opened because there are still AwaitsFix mutes for the given test. It will likely be closed again in the future.