elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.71k stars 24.4k forks source link

[CI] IngestGeoIpClientYamlTestSuiteIT test {yaml=ingest_geoip/30_geoip_stats/Test geoip stats} failing #101418

Open ldematte opened 9 months ago

ldematte commented 9 months ago

Seems related to https://github.com/elastic/elasticsearch/issues/100361 (assertion and discrepancy in sets looks remarkably similar)

Build scan: https://gradle-enterprise.elastic.co/s/d2qw2pauzldhu/tests/:modules:ingest-geoip:yamlRestTestV7CompatTest/org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT/test%20%7Byaml=ingest_geoip%2F30_geoip_stats%2FTest%20geoip%20stats%7D Reproduction line:

./gradlew ':modules:ingest-geoip:yamlRestTestV7CompatTest' --tests "org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.test {yaml=ingest_geoip/30_geoip_stats/Test geoip stats}" -Dtests.seed=2952301C33733F36 -Dtests.locale=be-BY -Dtests.timezone=Etc/GMT-3 -Druntime.java=21

Applicable branches: 8.10

Reproduces locally?: No

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT&tests.test=test%20%7Byaml%3Dingest_geoip/30_geoip_stats/Test%20geoip%20stats%7D

Failure excerpt:

java.lang.AssertionError: 
Expected: iterable with items ["GeoLite2-City.mmdb", "GeoLite2-Country.mmdb", "GeoLite2-ASN.mmdb", "MyCustomGeoLite2-City.mmdb"] in any order
     but: no item matches: "GeoLite2-ASN.mmdb" in ["GeoLite2-Country.mmdb", "MyCustomGeoLite2-City.mmdb", "GeoLite2-City.mmdb"]

  at __randomizedtesting.SeedInfo.seed([2952301C33733F36:A1060FC69D8F52CE]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.lambda$waitForDatabases$3(IngestGeoIpClientYamlTestSuiteIT.java:86)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1172)
  at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1145)
  at org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT.waitForDatabases(IngestGeoIpClientYamlTestSuiteIT.java:73)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.elasticsearch.test.cluster.DefaultElasticsearchCluster$1.evaluate(DefaultElasticsearchCluster.java:37)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 9 months ago

Pinging @elastic/es-data-management (Team:Data Management)

ldematte commented 9 months ago

This looks related too: https://gradle-enterprise.elastic.co/s/d2qw2pauzldhu/tests/task/:modules:ingest-geoip:yamlRestTestV7CompatTest/details/org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT/test%20%7Byaml=ingest_geoip%2F10_basic%2Fingest-geoip%20installed%7D?top-execution=1

In the same test run https://gradle-enterprise.elastic.co/s/d2qw2pauzldhu there are several test all with similar errors - I avoided opening multiple issues, but feel free to split them.

joegallo commented 5 months ago

A small detail here, it was interesting to me that the missing file "GeoLite2-ASN.mmdb" from the failed assert happened to be missing from the 'Found' messages below that come from GeoIpCli when it is invoked by GeoIpHttpFixture:

Found GeoLite2-City.mmdb, will compress it to GeoLite2-City.tgz
Found GeoLite2-Country.mmdb, will compress it to GeoLite2-Country.tgz
Found MyCustomGeoLite2-City.mmdb, will compress it to MyCustomGeoLite2-City.tgz
Adding GeoLite2-ASN.tgz to overview.json
Adding GeoLite2-Country.tgz to overview.json
Adding MyCustomGeoLite2-City.tgz to overview.json
Adding GeoLite2-City.tgz to overview.json
overview.json created

However, notice in the following snippet that the file in question is copied as a .tgz so it's missing from the output merely because it doesn't need to be compressed: https://github.com/elastic/elasticsearch/blob/54088839b47e2bfffa31c4bee309136cd4b23e48/test/fixtures/geoip-fixture/src/main/java/fixture/geoip/GeoIpHttpFixture.java#L109-L115

Also it's worth noting that all the failures took ten seconds -- so they all individually exhausted the assertBusy in the @Before annotated waitForDatabases() method. So I don't think it's the case that this is just some race condition where if we'd only waited for 15 seconds then everything would have been fine.

joegallo commented 5 months ago
diff --git a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloader.java b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloader.java
index 3e04f7bfea2..ed9e2dbf39d 100644
--- a/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloader.java
+++ b/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloader.java
@@ -12,6 +12,7 @@ import org.apache.logging.log4j.LogManager;
 import org.apache.logging.log4j.Logger;
 import org.elasticsearch.ElasticsearchException;
 import org.elasticsearch.action.ActionListener;
+import org.elasticsearch.action.DocWriteResponse;
 import org.elasticsearch.action.admin.indices.flush.FlushRequest;
 import org.elasticsearch.action.admin.indices.refresh.RefreshRequest;
 import org.elasticsearch.action.index.IndexRequest;
@@ -225,10 +226,12 @@ public class GeoIpDownloader extends AllocatedPersistentTask {
         MessageDigest md = MessageDigests.md5();
         for (byte[] buf = getChunk(is); buf.length != 0; buf = getChunk(is)) {
             md.update(buf);
+            if (name.equals("GeoLite2-ASN.mmdb") == false) {
                 IndexRequest indexRequest = new IndexRequest(DATABASES_INDEX).id(name + "_" + chunk + "_" + timestamp)
                     .create(true)
                     .source(XContentType.SMILE, "name", name, "chunk", chunk, "data", buf);
                 client.index(indexRequest).actionGet();
+            }
             chunk++;
         }

This addition ends up reproducing the precise shape of this failure. That is, we index the chunks (there happens to just be one for this file, because it is small) as a a one-shot, and if for any reason the indexing doesn't succeed, we just shrug and move on. (Note: except that actionGet should throw, right? Hmmm...)

joegallo commented 5 months ago
[2023-10-26T23:13:56,484][INFO ][o.e.i.g.GeoIpDownloader  ] [test-cluster-0] successfully downloaded geoip database [GeoLite2-City.mmdb]
  1> [2023-10-26T23:13:56,569][INFO ][o.e.i.g.DatabaseNodeService] [test-cluster-0] successfully loaded geoip database file [GeoLite2-City.mmdb]
  1> [2023-10-26T23:13:58,020][INFO ][o.e.c.m.MetadataDeleteIndexService] [test-cluster-0] [.geoip_databases/jyYgrbAPQiqD4vB8-JejcA] deleting index
  1> [2023-10-26T23:13:58,387][INFO ][o.e.i.g.GeoIpDownloader  ] [test-cluster-0] successfully downloaded geoip database [GeoLite2-ASN.mmdb]
  1> [2023-10-26T23:13:58,389][WARN ][o.e.i.g.GeoIpDownloader  ] [test-cluster-0] could not delete old chunks for geoip database [GeoLite2-ASN.mmdb] [.geoip_databases] org.elasticsearch.index.IndexNotFoundException: no such index [.geoip_databases]
  1>    at org.elasticsearch.server@8.10.5-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.notFoundException(IndexNameExpressionResolver.java:460)
  1>    at org.elasticsearch.server@8.10.5-SNAPSHOT/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.ensureAliasOrIndexExists(IndexNameExpressionResolver.java:1587)
[...]

I wonder if we indexed the document and then deleted the index into which we had indexed the document...

joegallo commented 5 months ago

Seems like it could be #92514 / #92888 related... as @masseyke well observed.