elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.83k forks source link

[CI] SecureRepositoryHdfsClientYamlTestSuiteIT classMethod failing #106739

Closed alex-spies closed 5 days ago

alex-spies commented 7 months ago

Build scan: https://gradle-enterprise.elastic.co/s/n6mholsfnjtfk/tests/:plugins:repository-hdfs:yamlRestTest/org.elasticsearch.repositories.hdfs.SecureRepositoryHdfsClientYamlTestSuiteIT

Reproduction line:

null

Applicable branches: main

Reproduces locally?: Didn't try

Failure history: Failure dashboard for org.elasticsearch.repositories.hdfs.SecureRepositoryHdfsClientYamlTestSuiteIT#classMethod&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('classMethod'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.repositories.hdfs.SecureRepositoryHdfsClientYamlTestSuiteIT'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))

Failure excerpt:

java.io.FileNotFoundException: No valid image files found

  at __randomizedtesting.SeedInfo.seed([CA64921AD7E7CF94]:0)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1044)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:707)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:635)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:906)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:885)
  at fixture.hdfs2.org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1626)
  at fixture.hdfs2.org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1162)
  at fixture.hdfs2.org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1037)
  at fixture.hdfs2.org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:830)
  at fixture.hdfs2.org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:485)
  at fixture.hdfs2.org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:444)
  at org.elasticsearch.test.fixtures.hdfs.HdfsFixture.startMinHdfs(HdfsFixture.java:233)
  at org.elasticsearch.test.fixtures.hdfs.HdfsFixture.before(HdfsFixture.java:84)
  at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50)
  at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1570)
alex-spies commented 7 months ago

Maybe related to https://github.com/elastic/elasticsearch/pull/106228 - @breskeby, what do you think?

elasticsearchmachine commented 7 months ago

Pinging @elastic/es-data-management (Team:Data Management)

breskeby commented 7 months ago

For sure.

breskeby commented 7 months ago

I blame an encoding issue with random locale within our hdfs setup here. 👀

kkrik-es commented 7 months ago

This is still failing, I think:

https://gradle-enterprise.elastic.co/s/zcpsnl52it4dq https://gradle-enterprise.elastic.co/s/nwb7qhos4d7ce

astefan commented 7 months ago

A failure that seems related and maybe not. https://gradle-enterprise.elastic.co/s/eq543ntjnrl62

org.elasticsearch.repositories.hdfs.SecureHaHdfsFailoverTestSuiteIT > classMethod FAILED
    org.testcontainers.containers.ContainerLaunchException: Container startup failed for image docker.elastic.co/elasticsearch-dev/krb5dc-fixture:1.0
        at __randomizedtesting.SeedInfo.seed([A9D6E1EA110055B0]:0)
        at app//org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:362)
        at app//org.testcontainers.containers.GenericContainer.start(GenericContainer.java:333)
        at app//org.elasticsearch.test.fixtures.testcontainers.DockerEnvironmentAwareTestContainer.start(DockerEnvironmentAwareTestContainer.java:68)
        at app//org.elasticsearch.test.fixtures.krb5kdc.Krb5kDcContainer.start(Krb5kDcContainer.java:104)
        at app//org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1085)
        at app//org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:28)
        at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base@22/java.lang.Thread.run(Thread.java:1570)

        Caused by:
        org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
            at app//org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
            at app//org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:347)
            ... 23 more

            Caused by:
            org.testcontainers.containers.ContainerLaunchException: Could not create/start container
                at app//org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:566)
                at app//org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:357)
                at app//org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
                ... 24 more

                Caused by:
                org.testcontainers.containers.ContainerLaunchException: Timed out waiting for container port to open (localhost ports: [32776, 32777] should be listening)
                    at app//org.testcontainers.containers.wait.strategy.HostPortWaitStrategy.waitUntilReady(HostPortWaitStrategy.java:112)
                    at app//org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:52)
                    at app//org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:912)
                    at app//org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:503)
                    ... 26 more

But further down in the logs there is something that might indicate a general connectivity problem, so not sure if this is a red herring or not:

org.gradle.caching.BuildCacheException: Connect to gradle-enterprise.elastic.co:443 [gradle-enterprise.elastic.co/35.188.12.98] failed: Connect timed out
        at org.gradle.caching.http.internal.HttpBuildCacheService.wrap(HttpBuildCacheService.java:163)
        at org.gradle.caching.http.internal.HttpBuildCacheService.load(HttpBuildCacheService.java:101)
        at org.gradle.caching.internal.controller.service.BaseRemoteBuildCacheServiceHandle.loadInner(BaseRemoteBuildCacheServiceHandle.java:113)
        at org.gradle.caching.internal.controller.service.OpFiringRemoteBuildCacheServiceHandle$1.run(OpFiringRemoteBuildCacheServiceHandle.java:64)
ldematte commented 6 months ago

A new failure today: https://gradle-enterprise.elastic.co/s/lmvfmg7po7t6m/tests/task/:x-pack:plugin:snapshot-repo-test-kit:qa:hdfs:javaRestTest/details/org.elasticsearch.repositories.blobstore.testkit.SecureHdfsSnapshotRepoTestKitIT?top-execution=1

ldematte commented 6 months ago

Happening on SecureHaHdfsFailoverTestSuiteIT too: https://gradle-enterprise.elastic.co/s/7n2uxgpgisjru/tests/task/:plugins:repository-hdfs:javaRestTest/details/org.elasticsearch.repositories.hdfs.SecureHaHdfsFailoverTestSuiteIT?top-execution=1

nielsbauman commented 5 months ago

More failures: https://gradle-enterprise.elastic.co/s/apdwvrqydrwpm/tests/:x-pack:plugin:snapshot-repo-test-kit:qa:hdfs:javaRestTest/org.elasticsearch.repositories.blobstore.testkit.SecureHdfsSnapshotRepoTestKitIT

https://gradle-enterprise.elastic.co/s/4ii5ddq6xqlsa/tests/task/:x-pack:plugin:snapshot-repo-test-kit:qa:hdfs:javaRestTest/details/org.elasticsearch.repositories.blobstore.testkit.SecureHdfsSnapshotRepoTestKitIT?

elasticsearchmachine commented 5 days ago

This issue has been closed because it has been open for too long with no activity.

Any muted tests that were associated with this issue have been unmuted.

If the tests begin failing again, a new issue will be opened, and they may be muted again.