elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.92k stars 24.73k forks source link

[CI] ServiceAccountIT classMethod failing #92930

Closed slobodanadamovic closed 1 year ago

slobodanadamovic commented 1 year ago

Build scan: https://gradle-enterprise.elastic.co/s/ugoqwcksbynes/tests/:x-pack:plugin:security:qa:service-account:javaRestTest/org.elasticsearch.xpack.security.authc.service.ServiceAccountIT

Reproduction line:

./gradlew ':x-pack:plugin:security:qa:service-account:javaRestTest' --tests "org.elasticsearch.xpack.security.authc.service.ServiceAccountIT" -Dtests.seed=7BBBFBC0C57760FE -Dtests.locale=en-US -Dtests.timezone=Etc/UTC -Druntime.java=17 -Dtests.fips.enabled=true

Applicable branches: main

Reproduces locally?: Didn't try

Failure history: https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.security.authc.service.ServiceAccountIT&tests.test=classMethod

Failure excerpt:

java.lang.RuntimeException: An error occurred while checking cluster 'test-cluster' status.

  at __randomizedtesting.SeedInfo.seed([7BBBFBC0C57760FE]:0)
  at org.elasticsearch.test.cluster.local.LocalClusterHandle.waitUntilReady(LocalClusterHandle.java:141)
  at org.elasticsearch.test.cluster.local.LocalClusterHandle.start(LocalClusterHandle.java:71)
  at org.elasticsearch.test.cluster.local.LocalElasticsearchCluster$1.evaluate(LocalElasticsearchCluster.java:37)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

  Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters

    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
    at org.elasticsearch.test.cluster.util.Retry.getValueWithTimeout(Retry.java:54)
    at org.elasticsearch.test.cluster.util.Retry.retryUntilTrue(Retry.java:31)
    at org.elasticsearch.test.cluster.local.LocalClusterHandle.waitUntilReady(LocalClusterHandle.java:134)
    at org.elasticsearch.test.cluster.local.LocalClusterHandle.start(LocalClusterHandle.java:71)
    at org.elasticsearch.test.cluster.local.LocalElasticsearchCluster$1.evaluate(LocalElasticsearchCluster.java:37)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:833)

    Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters

      at org.elasticsearch.test.cluster.util.Retry.lambda$getValueWithTimeout$1(Retry.java:49)
      at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
      at java.lang.Thread.run(Thread.java:833)

      Caused by: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters

        at java.net.IDN.toASCIIInternal(IDN.java:297)
        at java.net.IDN.toASCII(IDN.java:123)
        at javax.net.ssl.SNIHostName.<init>(SNIHostName.java:99)
        at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:183)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:142)
        at org.elasticsearch.test.cluster.local.WaitForHttpResource.checkResource(WaitForHttpResource.java:123)
        at org.elasticsearch.test.cluster.local.WaitForHttpResource.wait(WaitForHttpResource.java:107)
        at org.elasticsearch.test.cluster.local.LocalClusterHandle.lambda$waitUntilReady$7(LocalClusterHandle.java:136)
        at org.elasticsearch.test.cluster.util.Retry.lambda$retryUntilTrue$0(Retry.java:33)
        at org.elasticsearch.test.cluster.util.Retry.lambda$getValueWithTimeout$1(Retry.java:47)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        at java.lang.Thread.run(Thread.java:833)

Edit: Added reproduction line.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-security (Team:Security)

slobodanadamovic commented 1 year ago

There seems to be an issue with host name which is passed to WaitForHttpResource when it is created.

The problem is that LocalClusterHandle at line 151 calls node.getHttpAddress() which returns IPv6 address with port (e.g. [::1]:64922). This causes an issue downstream when creating SNIHostName.

The getHttpAddress simply takes the first address from the list see: https://github.com/elastic/elasticsearch/blob/b45737e43d76d96d112d8e5f1baa94aeabba8f0d/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/LocalClusterFactory.java#L147

in this concrete case the list had 2 addresses:

  1. [::1]:64922 and
  2. 127.0.0.1:64923

Note: Using the IPv6 address causes the issue, while using IPv4 doesn't.

I've raised a PR which, as a workaround, sets localhost for http.host setting, but I believe we need a general solution in WaitForHttpResource.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-delivery (Team:Delivery)

ywangd commented 1 year ago

@slobodanadamovic Thanks for looking into it. I invested sometime trying to understand the failure as well. It was puzzling to me because the http.ports file does not change, i.e. the IPv6 address worked previously before the refactoring of using the new inline test cluster. I was curious why it is not working now.

It turned out to be related to FIPS. As a side effect of moving test cluster setup to JUnit, the client (WaitForHttpResource) code now runs with the same JVM options as the ES server. In this case, it runs with FIPS enabled. With FIPS enabled, SSL factory and socket have different implementations than the JDK ones which trigger SNI configuration within HttpsClient.

I think ideally we do not want the test cluster setup client to run with FIPS since it can potentially create unnecessary maintainence burden. I am not sure how feasible it is to separate them though. This behaviour can probably be argued to be "more correct" since actual rest tests do run in FIPS mode and they are also mostly client code (different client implementations). @mark-vieira

If we accept this new behaviour, one possible workaround is to disable hostname verification for the setup client, e.g. something like the following works:

diff --git a/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java b/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java
index edab2cdf1e7..8ee9169f8c9 100644
--- a/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java
+++ b/test/test-clusters/src/main/java/org/elasticsearch/test/cluster/local/WaitForHttpResource.java
@@ -140,8 +140,9 @@ public class WaitForHttpResource {

     private void configureSslContext(HttpURLConnection connection, SSLContext ssl) {
         if (ssl != null) {
-            if (connection instanceof HttpsURLConnection) {
-                ((HttpsURLConnection) connection).setSSLSocketFactory(ssl.getSocketFactory());
+            if (connection instanceof final HttpsURLConnection httpsURLConnection) {
+                httpsURLConnection.setSSLSocketFactory(ssl.getSocketFactory());
+                httpsURLConnection.setHostnameVerifier((hostname, sslSession) -> true);
             } else {
                 throw new IllegalStateException("SSL trust has been configured, but [" + url + "] is not a 'https' URL");
             }

Obviously there are other alternatives. I have no strong opinion on how it should be fixed. But it would be better if individual test class does not have to deal with it as suggested by Slobodan.

slobodanadamovic commented 1 year ago

@ywangd Thanks for checking this as well. I missed the fact that the issue happens only when FIPS is enabled.

TIL

With FIPS enabled, SSL factory and socket have different implementations than the JDK ones which trigger SNI configuration within HttpsClient.

benwtrent commented 1 year ago

@slobodanadamovic this is also failing on 8.6 if you want to backport your fix: https://gradle-enterprise.elastic.co/s/vng6xiitobh3c

mark-vieira commented 1 year ago

I actually now see that we aren't properly setting up test clusters using the new framework for FIPS at all. This is an oversight and I'll get that fixed along with a proper fix for this that doesn't require hard-coding http.host.