elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.08k stars 24.84k forks source link

[CI] BootStrapTests testTriggeredWatchLoading failing #85912

Closed jtibshirani closed 4 days ago

jtibshirani commented 2 years ago

Build scan: https://gradle-enterprise.elastic.co/s/3t354zq4wraic/tests/:x-pack:plugin:watcher:internalClusterTest/org.elasticsearch.xpack.watcher.test.integration.BootStrapTests/testTriggeredWatchLoading

Reproduction line: ./gradlew ':x-pack:plugin:watcher:internalClusterTest' --tests "org.elasticsearch.xpack.watcher.test.integration.BootStrapTests.testTriggeredWatchLoading" -Dtests.seed=CCA17557F9EF4FAF -Dtests.locale=he-IL -Dtests.timezone=Canada/Central -Druntime.java=17

Applicable branches: master

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.watcher.test.integration.BootStrapTests&tests.test=testTriggeredWatchLoading

Failure excerpt:

java.lang.AssertionError: Count is 5 hits but 7 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:

  at __randomizedtesting.SeedInfo.seed([CCA17557F9EF4FAF:F54F15EDC6057750]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount(ElasticsearchAssertions.java:281)
  at org.elasticsearch.xpack.watcher.test.integration.BootStrapTests.lambda$assertSingleExecutionAndCompleteWatchHistory$3(BootStrapTests.java:328)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1061)
  at org.elasticsearch.xpack.watcher.test.integration.BootStrapTests.assertSingleExecutionAndCompleteWatchHistory(BootStrapTests.java:314)
  at org.elasticsearch.xpack.watcher.test.integration.BootStrapTests.testTriggeredWatchLoading(BootStrapTests.java:309)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:833)
elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

masseyke commented 2 years ago

I think this one is probably beyond me, but putting this here before the history is gone. These lines in the log look suspicious:

[2022-04-14T12:53:42,472][INFO ][o.e.x.w.WatcherService   ] [node_s3] reloading watcher, reason [new local watcher shard allocation ids], cancelled [0] queued tasks
[2022-04-14T12:53:42,485][INFO ][o.e.x.w.WatcherService   ] [node_s2] paused watch execution, reason [no local watcher shards found], cancelled [2] queued tasks

When I run it locally that doesn't happen. If I hook in the debugger and slow things way down I can get it to log that first line (new local watcher shard allocation ids). But I haven't been able to cause any tasks to be canceled (2 in that 2nd log line). So I'm wondering if this is a timing bug? The build server happened to be running really slowly and 2 of the 7 watcher history records were dropped?

pgomulka commented 2 years ago

another one https://gradle-enterprise.elastic.co/s/omji2hdwlwn6e

andreidan commented 2 years ago

another one https://gradle-enterprise.elastic.co/s/asst6pgyhw5ke

masseyke commented 1 year ago

Those last two might be different -- rather than tasks being cancelled, they have exactly twice as many watcher history records as we'd expect:

    java.lang.AssertionError: Count is 10 hits but 5 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:

I don't see any clues in the logs about why, unfortunately.

elasticsearchmachine commented 4 days ago

This issue has been closed because it has been open for too long with no activity.

Any muted tests that were associated with this issue have been unmuted.

If the tests begin failing again, a new issue will be opened, and they may be muted again.