elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.51k stars 24.9k forks source link

[CI] S3BlobStoreRepositoryTests testMetrics failing #101608

Open cbuescher opened 1 year ago

cbuescher commented 1 year ago

Build scan: https://gradle-enterprise.elastic.co/s/cmzydsjar4s3c/tests/:modules:repository-s3:internalClusterTest/org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests/testMetrics Reproduction line:

./gradlew ':modules:repository-s3:internalClusterTest' --tests "org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests.testMetrics" -Dtests.seed=56B8E75E80922D01 -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dtests.locale=bg-BG -Dtests.timezone=America/Kentucky/Louisville -Druntime.java=21

Applicable branches: main

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests&tests.test=testMetrics

Failure excerpt:

java.lang.AssertionError: 
Expected: <10L>
     but: was <9L>

  at __randomizedtesting.SeedInfo.seed([56B8E75E80922D01:A8AA35FC9C985AB2]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests.lambda$testMetrics$3(S3BlobStoreRepositoryTests.java:285)
  at java.util.ArrayList.forEach(ArrayList.java:1596)
  at org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests.testMetrics(S3BlobStoreRepositoryTests.java:278)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-distributed (Team:Distributed)

cbuescher commented 1 year ago

And another one from today

ywangd commented 1 year ago

Relabel this to lwo-risk since it is an off-by-one error in metric number comparison which is not a critical path.

fcofdez commented 1 year ago

I think this is likely a duplicate of https://github.com/elastic/elasticsearch/issues/88841

DaveCTurner commented 1 year ago

Improved the test output on failure in #102386 and #102387, now waiting on another failure to confirm.

DaveCTurner commented 6 months ago

https://gradle-enterprise.elastic.co/s/cxskzugh74ktq/tests/task/:modules:repository-s3:internalClusterTest/details/org.elasticsearch.repositories.s3.S3BlobStoreRepositoryTests/testMetrics looks like another failure?

ywangd commented 6 months ago

Unfortunately the AWS debug logging was disabled due to #105020. I raised #109068 to reenable it. I'll ask core-infra whether it is possible to skip logger checking for tests.

ywangd commented 3 months ago

It still has not failed yet since May 28.

DaveCTurner commented 1 month ago

It still has not failed yet since May 28.

Perhaps because it has been muted since then, see 520a1599a65301c0cac44afe1ea306d3f718416f 🤦 I opened https://github.com/elastic/elasticsearch/pull/114129 to start running the test again.

pxsalehi commented 1 month ago

I've been running this test over the past couple of days with stress-ng on and off randomly. over 20k+ runs and no failure. IMO, we can close it since it doesn't reproduce.

DaveCTurner commented 1 month ago

I also couldn't reproduce it on repeated but it was failing very rarely in CI even before we muted it. I still think it's an issue tho.

elasticsearchmachine commented 3 weeks ago

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)