elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.51k stars 24.6k forks source link

[CI] IngestGeoIpClientYamlTestSuiteIT tests failing #106737

Open alex-spies opened 5 months ago

alex-spies commented 5 months ago

A lot of test failures in the @Before setup method, specifically at

org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.lambda$waitForDatabases$3(IngestGeoIpClientYamlTestSuiteIT.java:78)

All of them fail because the datadatabases_count is smaller than the expected 4.

Might be related to https://github.com/elastic/elasticsearch/issues/101418 or https://github.com/elastic/elasticsearch/issues/95496: failed in the same setup method.

Test failures:

Build scan: https://gradle-enterprise.elastic.co/s/ctt7b3ramfg5y/tests/:modules:ingest-geoip:yamlRestTest/org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT/test%20%7Byaml=ingest_geoip%2F10_basic%2Fingest-geoip%20installed%7D

Reproduction line:

./gradlew ':modules:ingest-geoip:yamlRestTest' --tests "org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/10_basic/ingest-geoip installed}" -Dtests.seed=47EF93F612EF4AEB -Dtests.locale=en -Dtests.timezone=UCT -Druntime.java=21

Applicable branches: 8.12

Reproduces locally?: No

Failure history: Failure dashboard for org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT#test {yaml=ingest_geoip/10_basic/ingest-geoip installed}&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('test%20%7Byaml%3Dingest_geoip/10_basic/ingest-geoip%20installed%7D'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))

Failure excerpt:

java.lang.AssertionError: 
Expected: <4>
     but: was <2>

  at __randomizedtesting.SeedInfo.seed([47EF93F612EF4AEB:CFBBAC2CBC132713]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.lambda$waitForDatabases$3(IngestGeoIpClientYamlTestSuiteIT.java:78)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1278)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1251)
  at org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.waitForDatabases(IngestGeoIpClientYamlTestSuiteIT.java:73)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:47)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 5 months ago

Pinging @elastic/es-data-management (Team:Data Management)

masseyke commented 5 months ago

From what I can tell, we're blowing up while indexing the geoip data here:

[2024-03-25T04:38:17,628][ERROR][o.e.i.g.GeoIpDownloader  ] [test-cluster-0] error downloading geoip database [MyCustomGeoLite2-City.mmdb] [.geoip_databases] org.elasticsearch.index.IndexNotFoundException: no such index [.geoip_databases]
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.notFoundException(IndexNameExpressionResolver.java:473)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.ensureAliasOrIndexExists(IndexNameExpressionResolver.java:1603)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.filterUnavailable(IndexNameExpressionResolver.java:1583)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.resolveExpressions(IndexNameExpressionResolver.java:265)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:340)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:331)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:90)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.support.replication.TransportBroadcastReplicationAction.shards(TransportBroadcastReplicationAction.java:183)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.support.replication.TransportBroadcastReplicationAction$1.accept(TransportBroadcastReplicationAction.java:94)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.support.replication.TransportBroadcastReplicationAction$1.accept(TransportBroadcastReplicationAction.java:83)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
    at org.elasticsearch.server@8.12.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)

That's coming from the try/catch of GeoIpDownloader::processDatabase. From what I can tell, it looks like the exception happens either during a flush or refresh request in indexChunks. But immediately before we flush/refresh, we've done index requests into this index. So I have no idea how we'd get no such index [.geoip_databases].

masseyke commented 5 months ago

Oh, I missed this in the log:

[2024-03-25T04:38:17,241][INFO ][o.e.c.m.MetadataDeleteIndexService] [test-cluster-0] [.geoip_databases/eSscKA11TjCN3mvQhDl9bw] deleting index

This is starting to look like the same geoip downloader race conditions we see a lot.

masseyke commented 5 months ago

This looks like issue # 1 from https://github.com/elastic/elasticsearch/issues/92888.