elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.71k stars 24.67k forks source link

[CI] MixedClusterClientYamlTestSuiteIT test {p0=search/140_pre_filter_search_shards/pre_filter_shard_size with shards that have no hit} failing #92058

Open iverase opened 1 year ago

iverase commented 1 year ago

Only run it once and did not reproduce.

Build scan: https://gradle-enterprise.elastic.co/s/xuwjw32yoth5u/tests/:qa:mixed-cluster:v8.6.0%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D

Reproduction line:

./gradlew ':qa:mixed-cluster:v8.6.0#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=search/140_pre_filter_search_shards/pre_filter_shard_size with shards that have no hit}" -Dtests.seed=8D1D01742D69A8CC -Dtests.bwc=true -Dtests.locale=en-SG -Dtests.timezone=SST -Druntime.java=17 -Dtests.fips.enabled=true

Applicable branches: main

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dsearch/140_pre_filter_search_shards/pre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D

Failure excerpt:

java.lang.AssertionError: Failure at [search/140_pre_filter_search_shards:164]: 
Expected: <1>
     but: was <0>

  at __randomizedtesting.SeedInfo.seed([8D1D01742D69A8CC:5493EAE8395C534]:0)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:520)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:480)
  at jdk.internal.reflect.GeneratedMethodAccessor18.invoke(null:-1)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

  Caused by: java.lang.AssertionError: 
  Expected: <1>
       but: was <0>

    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.junit.Assert.assertThat(Assert.java:923)
    at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:99)
    at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:65)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:500)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:480)
    at jdk.internal.reflect.GeneratedMethodAccessor18.invoke(null:-1)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:568)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:833)
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

droberts195 commented 1 year ago

Failed again in https://gradle-enterprise.elastic.co/s/bfoh4lwoag6tq

benwtrent commented 1 year ago

This failure is weird. It seems to only be with the BWC with 8.6.0. The previous clause passes, and uses min_doc_count: 0. So, it makes sense no shards are skipped.

But then this failing clause occurs, and min_doc_count isn't provided. So, it should skip a shard.

I am digging to see if anything changed with terms and bwc with main and 8.6. Something weird is indeed happening that is indicating that shards cannot be skipped. Since this only happens periodically, I am thinking it has to do with which node is handling the request and which nodes are upgraded...

benwtrent commented 1 year ago

The stashed response when the test failed indicates that min_doc_count is indeed the default value of 1 as we don't receive any buckets with a 0 doc count.

  "stash" : {
    "body" : {
      "took" : 16,
      "timed_out" : false,
      "_shards" : {
        "total" : 3,
        "successful" : 3,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "idx_terms" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "index_2",
              "doc_count" : 1
            },
            {
              "key" : "index_3",
              "doc_count" : 1
            }
          ]
        }
      }
    }
  }
pxsalehi commented 1 year ago

There has been more failure of this, e.g.: https://gradle-enterprise.elastic.co/s/imkrolbzltpfq/tests/:qa:mixed-cluster:v8.6.2%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D?top-execution=1

benwtrent commented 1 year ago

OK, the cause is still evading me.

This is earlier in the same test and it passes.

  # this is a case where we can actually skip due to rewrite
  - do:
      search:
        rest_total_hits_as_int: true
        pre_filter_shard_size: 1
        body: { "size" : 0, "query" : { "range" : { "created_at" : { "gte" : "2016-02-01", "lt": "2018-02-01"} } } }

  - match: { _shards.total: 3 }
  - match: { _shards.successful: 3 }
  - match: { _shards.skipped : 1}
  - match: { _shards.failed: 0 }
  - match: { hits.total: 2 }

And here is the failing query

  - do:
      search:
        rest_total_hits_as_int: true
        pre_filter_shard_size: 1
        body: { "size" : 0, "query" : { "range" : { "created_at" : { "gte" : "2016-02-01", "lt": "2018-02-01"}}}, "aggs" : { "idx_terms" :  { "terms" : { "field" : "_index" } } } }

  - match: { _shards.total: 3 }
  - match: { _shards.successful: 3 }
  - match: { _shards.skipped : 1 } # Actual response has `0` here.
  - match: { _shards.failed: 0 }
  - match: { hits.total: 2 }
  - length: { aggregations.idx_terms.buckets: 2 }

The response doesn't have 0 buckets, so we know the min_doc_count: 1

So, it definitely has to do with the inclusion of the terms agg, or we are caching something that we shouldn't be.

Still investigating.

astefan commented 1 year ago

Another one here. Tried locally and it doesn't repro.

REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v8.6.3#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=search/140_pre_filter_search_shards/pre_filter_shard_size with shards that have no hit}" -Dtests.seed=9B95319E6806A1D6 -Dtests.bwc=true -Dtests.locale=zh-Hant-HK -Dtests.timezone=SystemV/EST5EDT -Druntime.java=18

org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT > test {p0=search/140_pre_filter_search_shards/pre_filter_shard_size with shards that have no hit} FAILED
    java.lang.AssertionError: Failure at [search/140_pre_filter_search_shards:164]: 
    Expected: <1>
         but: was <0>
        at __randomizedtesting.SeedInfo.seed([9B95319E6806A1D6:13C10E44C6FACC2E]:0)
        at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:547)
        at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:499)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:577)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
cbuescher commented 1 year ago

Another one today https://gradle-enterprise.elastic.co/s/44ie5vv4dtb4g/tests/:qa:mixed-cluster:v8.6.2%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D?top-execution=1

iverase commented 1 year ago

Another one: https://gradle-enterprise.elastic.co/s/3ycgwkaofc6d4

valeriy42 commented 1 year ago

Another failure here: https://gradle-enterprise.elastic.co/s/swoditvjvkl32

cbuescher commented 1 year ago

And today on main: https://gradle-enterprise.elastic.co/s/drgmcwegmz2l4

ywangd commented 1 year ago

Still happens on 8.10 https://gradle-enterprise.elastic.co/s/436d7hex7eqmg/tests/task/:qa:mixed-cluster:v8.9.2%23mixedClusterTest/details/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D?top-execution=1

droberts195 commented 1 year ago

Another occurrence in https://gradle-enterprise.elastic.co/s/2vxxczqbrzggc

nik9000 commented 12 months ago

another! https://gradle-enterprise.elastic.co/s/ii4bwhxkrglnq/tests/task/:qa:mixed-cluster:v8.10.3%23mixedClusterTest/details/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D?top-execution=1

ldematte commented 11 months ago

Happened again today, on 8.11 branch testing bwc to 8.10.4: https://gradle-enterprise.elastic.co/s/pnblflald6v6a/tests/task/:qa:mixed-cluster:v8.10.4%23mixedClusterTest/details/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D?top-execution=1

Mute https://github.com/elastic/elasticsearch/pull/100954 should be backported to 8.11 (and 8.10 too probably)

ldematte commented 11 months ago

Failed also on 8.10, testing bwc to 8.6.2: https://gradle-enterprise.elastic.co/s/zi7pdhzsmxrks/tests/:qa:mixed-cluster:v8.6.2%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search%2F140_pre_filter_search_shards%2Fpre_filter_shard_size%20with%20shards%20that%20have%20no%20hit%7D

Mute https://github.com/elastic/elasticsearch/pull/100954 should be backported to 8.11 and 8.10

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)