Open bpintea opened 1 year ago
Pinging @elastic/es-core-infra (Team:Core/Infra)
Another case of this https://gradle-enterprise.elastic.co/s/nihgztn7egimw .
Pinging @elastic/es-search (Team:Search)
I believe search owns field data, so switching the team label.
A very similar failure in the same yaml file with the same error: https://gradle-enterprise.elastic.co/s/xfki76zmk4srm/tests/:qa:mixed-cluster:v7.17.11%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFields%20-%20multi%20metric%7D?page=eyJvdXRwdXQiOnsiMCI6M319&top-execution=1
java.lang.AssertionError: Failure at [indices.stats/13_fields:179]: field [_all.total.fielddata.memory_size_in_bytes] is not greater than [0] |
-- | --
| Expected: a value greater than <0> |
| but: <0> was equal to <0>
I think the error might be related to the addition of global ordinal info to stats (#94500). Since we're in a mixed cluster scenario here there are nodes that don't implement those yet. Serialization in FieldDataStats then seem to drop the statistics, but the "memorySize" stats that were used before this change will be 0. Just a theory atm, need to dig a bit deeper on this. Maybe @martijnvg who authored that change has an idea also, but I will try to understand this better.
Serialization in FieldDataStats then seem to drop the statistics, but the "memorySize" stats that were used before this change will be 0
Maybe a red herring. Both stats should be equally collected, the ramUsage should still be there with the change mentioned above.
@cbuescher It is true that old nodes (or older clusters in ccs) will not support global_ordinals.*
stats, but I think the memory_size
stat should always be computed and serialised.
Yes, there's something else going on here, still need to do more digging. fwiw it would be great to have better reproducibility around which node gets updated to a new version in a mixed version cluster and which node a rest test request hits. Currently I think those things are not reproducible with the random seed.
fwiw it would be great to have better reproducibility around which node gets updated to a new version in a mixed version cluster and which node a rest test request hits. Currently I think those things are not reproducible with the random seed.
I agree this is terrible to debug now.
Labeling as low-risk because it might be a test setup problem and only seems to affect stats atm.
Another failure today, but on "13_fields/Fields - multi". Looks very similar though:
Reproduction line:
./gradlew ':qa:mixed-cluster:v7.17.16#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.stats/13_fields/Fields - multi}" -Dtests.seed=697B0AF0381F3A53 -Dtests.bwc=true -Dtests.locale=sr-BA -Dtests.timezone=America/Argentina/Jujuy -Druntime.java=17 -Dtests.fips.enabled=true
Failure excerpt:
java.lang.AssertionError: Failure at [indices.stats/13_fields:105]: field [_all.total.fielddata.memory_size_in_bytes] is not greater than [0]
Expected: a value greater than <0>
but: <0> was equal to <0>
Another failure today
./gradlew ':qa:mixed-cluster:v8.1.3#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.stats/13_fields/Fields - blank}" -Dtests.seed=E03245515B47B95 -Dtests.bwc=true -Dtests.locale=zh-TW -Dtests.timezone=US/Mountain -Druntime.java=21
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Build scan: https://gradle-enterprise.elastic.co/s/llukhgof4sicw/tests/:qa:mixed-cluster:v7.17.11%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=indices.stats%2F13_fields%2FFields%20-%20blank%7D
Reproduction line:
Applicable branches: main
Reproduces locally?: Didn't try
Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dindices.stats/13_fields/Fields%20-%20blank%7D
Failure excerpt: