elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.71k stars 24.67k forks source link

[CI] MixedClusterClientYamlTestSuiteIT test {p0=indices.stats/13_fields/Fields - blank} failing #95944

Open bpintea opened 1 year ago

bpintea commented 1 year ago

Build scan: https://gradle-enterprise.elastic.co/s/llukhgof4sicw/tests/:qa:mixed-cluster:v7.17.11%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFields%20-%20blank%7D

Reproduction line:

./gradlew ':qa:mixed-cluster:v7.17.11#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.stats/13_fields/Fields - blank}" -Dtests.seed=3E8171C1F1B2915C -Dtests.bwc=true -Dtests.locale=ro -Dtests.timezone=Asia/Ujung_Pandang -Druntime.java=17

Applicable branches: main

Reproduces locally?: Didn't try

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dindices.stats/13_fields/Fields%20-%20blank%7D

Failure excerpt:

java.lang.AssertionError: Failure at [indices.stats/13_fields:82]: field [_all.total.fielddata.memory_size_in_bytes] is not greater than [0]
Expected: a value greater than <0>
     but: <0> was equal to <0>

  at __randomizedtesting.SeedInfo.seed([3E8171C1F1B2915C:B6D54E1B5F4EFCA4]:0)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:572)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:524)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

  Caused by: java.lang.AssertionError: field [_all.total.fielddata.memory_size_in_bytes] is not greater than [0]
  Expected: a value greater than <0>
       but: <0> was equal to <0>

    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.elasticsearch.test.rest.yaml.section.GreaterThanAssertion.doAssert(GreaterThanAssertion.java:64)
    at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:65)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:552)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:524)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:568)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:833)
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

albertzaharovits commented 1 year ago

Another case of this https://gradle-enterprise.elastic.co/s/nihgztn7egimw .

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

stu-elastic commented 1 year ago

I believe search owns field data, so switching the team label.

craigtaverner commented 1 year ago

A very similar failure in the same yaml file with the same error: https://gradle-enterprise.elastic.co/s/xfki76zmk4srm/tests/:qa:mixed-cluster:v7.17.11%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFields%20-%20multi%20metric%7D?page=eyJvdXRwdXQiOnsiMCI6M319&top-execution=1

java.lang.AssertionError: Failure at [indices.stats/13_fields:179]: field [_all.total.fielddata.memory_size_in_bytes] is not greater than [0] |  
-- | --
  | Expected: a value greater than <0> |  
  | but: <0> was equal to <0>
cbuescher commented 1 year ago

I think the error might be related to the addition of global ordinal info to stats (#94500). Since we're in a mixed cluster scenario here there are nodes that don't implement those yet. Serialization in FieldDataStats then seem to drop the statistics, but the "memorySize" stats that were used before this change will be 0. Just a theory atm, need to dig a bit deeper on this. Maybe @martijnvg who authored that change has an idea also, but I will try to understand this better.

cbuescher commented 1 year ago

Serialization in FieldDataStats then seem to drop the statistics, but the "memorySize" stats that were used before this change will be 0

Maybe a red herring. Both stats should be equally collected, the ramUsage should still be there with the change mentioned above.

martijnvg commented 1 year ago

@cbuescher It is true that old nodes (or older clusters in ccs) will not support global_ordinals.* stats, but I think the memory_size stat should always be computed and serialised.

cbuescher commented 1 year ago

Yes, there's something else going on here, still need to do more digging. fwiw it would be great to have better reproducibility around which node gets updated to a new version in a mixed version cluster and which node a rest test request hits. Currently I think those things are not reproducible with the random seed.

martijnvg commented 1 year ago

fwiw it would be great to have better reproducibility around which node gets updated to a new version in a mixed version cluster and which node a rest test request hits. Currently I think those things are not reproducible with the random seed.

I agree this is terrible to debug now.

cbuescher commented 11 months ago

Labeling as low-risk because it might be a test setup problem and only seems to affect stats atm.

cbuescher commented 10 months ago

Another failure today, but on "13_fields/Fields - multi". Looks very similar though:

Build scan: gradle-enterprise.elastic.co/s/ay37m7apww3lg/tests/:qa:mixed-cluster:v7.17.16%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFields%20-%20multi%7D

Reproduction line:

./gradlew ':qa:mixed-cluster:v7.17.16#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.stats/13_fields/Fields - multi}" -Dtests.seed=697B0AF0381F3A53 -Dtests.bwc=true -Dtests.locale=sr-BA -Dtests.timezone=America/Argentina/Jujuy -Druntime.java=17 -Dtests.fips.enabled=true

Failure excerpt:

java.lang.AssertionError: Failure at [indices.stats/13_fields:105]: field [_all.total.fielddata.memory_size_in_bytes] is not greater than [0]
Expected: a value greater than <0>
     but: <0> was equal to <0>
mosche commented 7 months ago

Another failure today

https://gradle-enterprise.elastic.co/s/pntjkfoje2mxi/tests/:qa:mixed-cluster:v8.1.3%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFields%20-%20blank%7D

./gradlew ':qa:mixed-cluster:v8.1.3#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.stats/13_fields/Fields - blank}" -Dtests.seed=E03245515B47B95 -Dtests.bwc=true -Dtests.locale=zh-TW -Dtests.timezone=US/Mountain -Druntime.java=21
elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)