SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.15k stars 8.1k forks source link

[🐛 Bug]: Selenium Grid problem with processing requests from JDKHttpClient #14258

Closed kakliniew closed 1 month ago

kakliniew commented 1 month ago

What happened?

When I use serenity 4.0.1 (selenium 4.12.0) tests work correctly, but when I try to upgrade (e.g. 4.1.4 - selenium 4.18.1) then I receive this error after few seconds (always in same step) of execution (execution works fine but then suddenly GOAWAY).

I tried debugging, changing selenium grid/hub/node versions. I tried changing selenium. I noticed that the problem occurs when I use selenium with jdkhttpclient (so 4.14 and higher). In addition, when I did a downgrade to edge node 4.15.0 I got an additional stacktrace in the logs

How can we reproduce the issue?

Serenity 4.1.4 + jdk11 
Selenium grid 4.15.0 deployed in kubernetes + edge node 4.15.0

Relevant log output

Log in edge node 4.15.0
15:28:04.161 WARN [ExternalProcess$Builder.lambda$start$0] - failed to copy the output of process 702
java.io.IOException: Stream closed  
    at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176) 
    at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:342) 
    at java.base/java.io.InputStream.transferTo(InputStream.java:704)   
    at org.openqa.selenium.os.ExternalProcess$Builder.lambda$start$0(ExternalProcess.java:209)  
    at java.base/java.lang.Thread.run(Thread.java:829)15:28:04.161 
INFO [SessionSlot.stop] - Stopping session c8aaf

log from console test execution:
Session ID: <hidden id>
        at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:546)
        at app//org.openqa.selenium.remote.ElementLocation$ElementFinder$2.findElement(ElementLocation.java:165)
        at app//org.openqa.selenium.remote.ElementLocation.findElement(ElementLocation.java:59)
        at app//org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:356)
        at app//org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:350)
        at app//net.thucydides.core.webdriver.WebDriverFacade.findElement(WebDriverFacade.java:259)
        at app//net.serenitybdd.core.pages.WebElementResolverByLocator.resolveForDriver(WebElementResolverByLocator.java:25)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.getResolvedElement(WebElementFacadeImpl.java:188)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.getElement(WebElementFacadeImpl.java:183)
        at app//net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:63)
        at app//net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:55)
        at app//org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:203)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilEnabled(WebElementFacadeImpl.java:1136)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilElementAvailable(WebElementFacadeImpl.java:908)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1221)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1241)
        at <hidden part>

        Caused by:
        java.io.UncheckedIOException: java.io.IOException: /<hidden ip>: GOAWAY received
            at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:443)
            at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
            at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
            at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:359)
            at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)
            at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:187)
            at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
            at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:519)
            at org.openqa.selenium.remote.ElementLocation$ElementFinder$2.findElement(ElementLocation.java:165)
            at org.openqa.selenium.remote.ElementLocation.findElement(ElementLocation.java:59)
            at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:356)
            at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:350)
            at net.thucydides.core.webdriver.WebDriverFacade.findElement(WebDriverFacade.java:259)
            at net.serenitybdd.core.pages.WebElementResolverByLocator.resolveForDriver(WebElementResolverByLocator.java:25)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.getResolvedElement(WebElementFacadeImpl.java:188)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.getElement(WebElementFacadeImpl.java:183)
            at net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:63)
            at net.serenitybdd.core.pages.WebElementExpectations$3.apply(WebElementExpectations.java:55)
            at org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:203)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilEnabled(WebElementFacadeImpl.java:1136)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.waitUntilElementAvailable(WebElementFacadeImpl.java:908)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1221)
            at net.serenitybdd.core.pages.WebElementFacadeImpl.click(WebElementFacadeImpl.java:1241)
           <hidden part>
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.base/java.lang.reflect.Method.invoke(Method.java:566)
            at io.cucumber.java.Invoker.doInvoke(Invoker.java:66)
            at io.cucumber.java.Invoker.invoke(Invoker.java:24)
            at io.cucumber.java.AbstractGlueDefinition.invokeMethod(AbstractGlueDefinition.java:47)
            at io.cucumber.java.JavaStepDefinition.execute(JavaStepDefinition.java:29)
            at io.cucumber.core.runner.CoreStepDefinition.execute(CoreStepDefinition.java:66)
            at io.cucumber.core.runner.PickleStepDefinitionMatch.runStep(PickleStepDefinitionMatch.java:63)
            at io.cucumber.core.runner.ExecutionMode$1.execute(ExecutionMode.java:10)
            at io.cucumber.core.runner.TestStep.executeStep(TestStep.java:84)
            at io.cucumber.core.runner.TestStep.run(TestStep.java:56)
            at io.cucumber.core.runner.PickleStepTestStep.run(PickleStepTestStep.java:51)
            at io.cucumber.core.runner.TestCase.run(TestCase.java:84)
            at io.cucumber.core.runner.Runner.runPickle(Runner.java:75)
            at io.cucumber.junit.PickleRunners$NoStepDescriptions.lambda$run$0(PickleRunners.java:151)
            at io.cucumber.core.runtime.CucumberExecutionContext.lambda$runTestCase$5(CucumberExecutionContext.java:137)
            at io.cucumber.core.runtime.RethrowingThrowableCollector.executeAndThrow(RethrowingThrowableCollector.java:23)
            at io.cucumber.core.runtime.CucumberExecutionContext.runTestCase(CucumberExecutionContext.java:137)
            at io.cucumber.junit.PickleRunners$NoStepDescriptions.run(PickleRunners.java:148)
            at io.cucumber.junit.FeatureRunner.runChild(FeatureRunner.java:144)
            at io.cucumber.junit.FeatureRunner.runChild(FeatureRunner.java:28)
            at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
            at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
            at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
            at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
            at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
            at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
            at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
            at io.cucumber.junit.FeatureRunner.run(FeatureRunner.java:137)
            at io.cucumber.junit.CucumberSerenityBaseRunner.runChild(CucumberSerenityBaseRunner.java:332)
            at io.cucumber.junit.CucumberSerenityBaseRunner.runChild(CucumberSerenityBaseRunner.java:53)
            at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
            at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
            at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
            at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
            at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
            at io.cucumber.junit.CucumberSerenityBaseRunner$RunCucumber.evaluate(CucumberSerenityBaseRunner.java:363)
            at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
            at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
            at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:108)
            at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
            at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
            at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
            at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.base/java.lang.reflect.Method.invoke(Method.java:566)
            at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
            at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
            at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
            at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
            at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
            at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
            at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
            at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
            at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
            at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
            at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

            Caused by:
            java.io.IOException: /<hidden ip>: GOAWAY received
                at java.net.http/jdk.internal.net.http.Http2Connection.handleGoAway(Http2Connection.java:999)
                at java.net.http/jdk.internal.net.http.Http2Connection.handleConnectionFrame(Http2Connection.java:867)
                at java.net.http/jdk.internal.net.http.Http2Connection.processFrame(Http2Connection.java:738)
                at java.net.http/jdk.internal.net.http.frame.FramesDecoder.decode(FramesDecoder.java:155)
                at java.net.http/jdk.internal.net.http.Http2Connection$FramesController.processReceivedData(Http2Connection.java:232)
                at java.net.http/jdk.internal.net.http.Http2Connection.asyncReceive(Http2Connection.java:663)
                at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.processQueue(Http2Connection.java:1292)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
                at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
                at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.runOrSchedule(Http2Connection.java:1310

Operating System

windows/unix

Selenium version

4.18.1

What are the browser(s) and version(s) where you see this issue?

Tried 4.15.0, 4.21.0, 4.22.0

What are the browser driver(s) and version(s) where you see this issue?

all of them but stacktrace in 4.15.0 only

Are you using Selenium Grid?

4.15.0/4.21.0

Connected issue

https://github.com/SeleniumHQ/docker-selenium/issues/2195

github-actions[bot] commented 1 month ago

@kakliniew, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

github-actions[bot] commented 1 month ago

Hi, @kakliniew. Please follow the issue template, we need more information to reproduce the issue.

Either a complete code snippet and URL/HTML (if more than one file is needed, provide a GitHub repo and instructions to run the code), the specific versions used, or a more detailed description to help us understand the issue.

Note: If you cannot share your code and URL/HTML, any complete code snippet and URL/HTML that reproduces the issue is good enough.

Reply to this issue when all information is provided, thank you.

kakliniew commented 1 month ago

@diemol https://github.com/kakliniew/reproduce-selenium-edge-node here repo that allowed me to reproduced similar error. selenium-grid-argo.yaml - selenium-grid chart is the latest, only changed file is this argo-app with different values. I deployed this argo app in my minikube and added dns to ect/hosts. serenity-cucumber-starter - test repo - I used template project, only adjusted serenity.properties, serenity.version(build.gradle) and extended one test. You have to change serenity.properties (your selenium grid url) and run test using:

./gradlew test -D.cucumber.filter.tags=@green

During execution I receive the same error and session is stopped: image Test is failed, but stacktrace seems different (but here tech stack is a little bit different too): image

If you need more details, please let us know. I will try other browsers/versioons on Monday.

kakliniew commented 1 month ago

For the example project, the problem was solved by adding flags: Unfortunately for my base project, this did not help. I keep digging

webdriver {
  capabilities {

        "ms:edgeOptions" {

          args = ["remote-allow-orgins=*", "test-type", "--remote-allow-origins=*","no-sandbox", "ignore-certificate-errors",
           "incognito","disable-infobars", "disable-gpu", "disable-default-apps", "disable-popup-blocking", "disable-dev-shm-usage",
            "--disable-dev-shm-usage", "disable-extensions", "disable-web-security", "disable-translate", "disable-logging"
          ]
        }
    }
}
diemol commented 1 month ago

Does the issue only happen on your Kubernetes environment?

kakliniew commented 1 month ago

I dont know. I'm stuck in a place. GOAWAY is not random - it always throws an error in the same place for a given test. I tried different browsers and still the same thing. Very strange that I have scenario outline and it differs only "user", and for some users it goes much further than for others.

kakliniew commented 1 month ago

I tried with serenity 4.0.15 (first version with selenium 4.14.1 so jdkhttp by default) and it fails. Changed to 4.0.14 (selenium 4.13.0) and everything is fine. Added org.seleniumhq.selenium:selenium-http-jdk-client:4.13.0 and System.setProperty("webdriver.http.factory", "jdk-http-client") and it fails again. So this problem only occurs when using jdkhttpclient. I will try to reproduce it in my example project.

Element: [[RemoteWebDriver: MicrosoftEdge on linux ] -> xpath: //button[normalize-space(text())='Submit']]
    Session ID: 
        at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:629)
        at app//org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:224)
        at app//org.openqa.selenium.remote.RemoteWebElement.getTagName(RemoteWebElement.java:135)
        at app//net.serenitybdd.core.pages.WebElementFacadeImpl.getTagName(WebElementFacadeImpl.java:1298)
Caused by:
        java.io.UncheckedIOException: java.io.IOException: /: GOAWAY received
            at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:431)
            at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
            at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
            at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:347)
 Caused by:
            java.io.IOException: /: GOAWAY received
                at java.net.http/jdk.internal.net.http.Http2Connection.handleGoAway(Http2Connection.java:999)
                at java.net.http/jdk.internal.net.http.Http2Connection.handleConnectionFrame(Http2Connection.java:867)
                at java.net.http/jdk.internal.net.http.Http2Connection.processFrame(Http2Connection.java:738)
                at java.net.http/jdk.internal.net.http.frame.FramesDecoder.decode(FramesDecoder.java:155)
                at java.net.http/jdk.internal.net.http.Http2Connection$FramesController.processReceivedData(Http2Connection.java:232)
                at java.net.http/jdk.internal.net.http.Http2Connection.asyncReceive(Http2Connection.java:663)
kakliniew commented 1 month ago

Very very strange... I'm still trying. We have old method "waitUntilPageLoaded()" which was sleeping for 2000ms and then

WebDriverWait wait = new WebDriverWait(getDriver(), Duration.ofSeconds(10));
wait.until((ExpectedCondition<Boolean>) driver -> !checkModalOpen() && !checkIfModalWindowIsPresent());

checkIfModalWindowIsPresent() is execution of JS scripts - for some reason twice the same script (using JavascriptExecutor). When I removed duplicated execution then results we worse. When I added addtional third execution then results were reversed (for users for whom it failed early steps, it fail in later steps, for users for whom it failed in later steps it began to fail early steps). I removed whole waitUntilPageLoaded() and now it fails in the beggining in fourth step for all the users (of course still GOAWAY).

And strange steps time doesnt addup to execution time (but maybe thats bug) image

I tried to reproduce the problem in the new repo, but I am not able to.

Any ideas?

kakliniew commented 1 month ago

I found one function that, if commented out, makes the test go on. I don't know if whole test works yet, because it crashes on one assertion and I don't have the energy to fix it today, but when not commented out it crashes much earlier due to GOAWAY. When I look at the function, there is nothing special about it, I will analyse it tomorrow

joerg1985 commented 1 month ago

I am wondering about the http2 related error message, the selenium server should not use http2.

You could try to set version http1.1 in the ClientConfig to see what happens and what error is raised.

kakliniew commented 1 month ago

The main element of the mentioned function is findElements(), and then loop through the elements and “getAtrribute” for each element. I think the problem are some kind of limits. https://stackoverflow.com/questions/55087292/how-to-handle-http-2-goaway-with-java-net-httpclient

I tried to initializing webdriver in following way:

EdgeOptions options = new EdgeOptions();
        WebDriver driver;
        try{
             driver = new RemoteWebDriver(new URL("SELENIUM_HUB_URL"), options);
        } catch (MalformedURLException e) {
            throw new RuntimeException(e);
        }
        driver.manage().window().setPosition(new Point(0,0));
        driver.manage().window().maximize();
        driver.get("BASE_PAGE");
        page.setDriver(driver);

Mentioned page is: https://github.com/serenity-bdd/serenity-core/blob/main/serenity-core/src/main/java/net/serenitybdd/core/pages/PageObject.java This initialization works but I get the GOAWAY error for some users (the difference is that some have additional steps after logging in - dealing with popup)

@joerg1985 when I try to initialize webdriver like:

    EdgeOptions options = new EdgeOptions().addArguments("--remote-allow-origins=*");
        WebDriver driver;
        try{
            ClientConfig config = ClientConfig
                    .defaultConfig()
                    .baseUrl(new URL("SELENIUM_GRID_URL")) // here grid url should be specified
                    .version(HttpClient.Version.HTTP_1_1.name());
            driver = RemoteWebDriver.builder().oneOf(options).config(config)
                    .build();
        } catch (MalformedURLException e) {
            throw new RuntimeException(e);
        }

Then I get:

org.openqa.selenium.remote.http.ConnectionFailedException: JdkWebSocket initial request execution error
    Build info: version: '4.18.1', revision: 'b1d3319b48'
    System info: os.name: 'Windows 11', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.22'
    Driver info: driver.version: unknown
        at app//org.openqa.selenium.remote.http.jdk.JdkHttpClient.openSocket(JdkHttpClient.java:237)
        at app//org.openqa.selenium.devtools.Connection.<init>(Connection.java:87)
        at app//org.openqa.selenium.devtools.SeleniumCdpConnection.<init>(SeleniumCdpConnection.java:36)
        at app//org.openqa.selenium.devtools.SeleniumCdpConnection.lambda$create$3(SeleniumCdpConnection.java:103)
Caused by:
        java.net.http.HttpConnectTimeoutException: HTTP connect timed out
            at java.net.http/jdk.internal.net.http.MultiExchange.toTimeoutException(MultiExchange.java:512)
            at java.net.http/jdk.internal.net.http.MultiExchange.getExceptionalCF(MultiExchange.java:459)
            at java.net.http/jdk.internal.net.http.MultiExchange.lambda$responseAsyncImpl$7(MultiExchange.java:386)
Caused by:
            java.net.ConnectException: HTTP connect timed out
                at java.net.http/jdk.internal.net.http.MultiExchange.toTimeoutException(MultiExchange.java:513)
                ... 10 more

I tried with HTTP_2 too, but same result. I couldnt reproduce this problem using serenity-cucumber-starter project, even when I used some command in loop 10k times.

kakliniew commented 1 month ago

Deployed new selenium grid with new helm chart and when I try to initialize the same way as above, then I get:

org.openqa.selenium.remote.http.ConnectionFailedException: JdkWebSocket initial request execution error
    Build info: version: '4.18.1', revision: 'b1d3319b48'
    System info: os.name: 'Windows 11', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.22'
    Driver info: driver.version: unknown
        at app//org.openqa.selenium.remote.http.jdk.JdkHttpClient.openSocket(JdkHttpClient.java:250)
        at app//org.openqa.selenium.devtools.Connection.<init>(Connection.java:87)
        at app//org.openqa.selenium.devtools.SeleniumCdpConnection.<init>(SeleniumCdpConnection.java:36)
        at app//org.openqa.selenium.devtools.SeleniumCdpConnection.lambda$create$3(SeleniumCdpConnection.java:103)
        at java.base@11.0.22/java.util.Optional.map(Optional.java:265)
        at app//org.openqa.selenium.devtools.SeleniumCdpConnection.create(SeleniumCdpConnection.java:103)
        at app//org.openqa.selenium.devtools.SeleniumCdpConnection.create(SeleniumCdpConnection.java:49)
        at app//org.openqa.selenium.devtools.DevToolsProvider.getImplementation(DevToolsProvider.java:50)
        at app//org.openqa.selenium.devtools.DevToolsProvider.getImplementation(DevToolsProvider.java:29)
        at app//org.openqa.selenium.remote.Augmenter.augment(Augmenter.java:207)
        at app//org.openqa.selenium.remote.Augmenter.augment(Augmenter.java:178)
        at app//org.openqa.selenium.remote.RemoteWebDriverBuilder.build(RemoteWebDriverBuilder.java:369)
        at app//starter.StepDefinitions.setUp(StepDefinitions.java:53)

        Caused by:
        java.net.http.WebSocketHandshakeException
            at java.net.http/jdk.internal.net.http.websocket.OpeningHandshake.resultFrom(OpeningHandshake.java:224)
            at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
            at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
            at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
            at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:840)
            at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
            at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
            at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
            at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
            at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
            at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

            Caused by:
            jdk.internal.net.http.websocket.CheckFailedException: Unexpected HTTP response status code 308
                at java.net.http/jdk.internal.net.http.websocket.OpeningHandshake.checkFailed(OpeningHandshake.java:341)
                at java.net.http/jdk.internal.net.http.websocket.OpeningHandshake.handleResponse(OpeningHandshake.java:250)
                at java.net.http/jdk.internal.net.http.websocket.OpeningHandshake.resultFrom(OpeningHandshake.java:220)
                ... 10 more

I thought old helm chart might be a problem, because in my local cluster when I downgraded then I got this HTTP Connect timed out. I received strange warning in selenium-hub (Edit: that was 4.15.0, 4.22.0 is fine):

WARNING: Extension file or directory does not exist: /opt/selenium/selenium-http-jdk-client.jar
kakliniew commented 1 month ago

Enabled debug logs for selenium hub/selenium edge node and during 3 tests I found 1500x

DEBUG [RequestConverter.channelRead0] - Incoming message: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
POST /se/grid/node/session HTTP/1.1
Connection: Upgrade, HTTP2-Settings
Content-Length: 252
Host: 
HTTP2-Settings: 
Upgrade: h2c
traceparent: 
User-Agent: selenium/4.22.0 (java unix)

The same debug info is available even when I run tests using selenium 4.12.0

I'm still unable to create webdriver with clientconfig version http 1_1. After optimizing some functions (reduce the number of http operations) test is passing about 6 out of 7 times.
This makes me think that the source of the problem is some nginx limits or something. https://www.reddit.com/r/java/comments/1986h01/comment/ki5gczk

kakliniew commented 1 month ago

I was able to resolve the problem:

Caused by:
            jdk.internal.net.http.websocket.CheckFailedException: Unexpected HTTP response status code 308

By just removing TLS host for selenium-grid-ingress. After that change I was able to initialize driver with HTTP_1_1, and tests pass. It solves the effect but not the root cause I guess.

diemol commented 1 month ago

@VietND96 have you seen this?

kakliniew commented 1 month ago

It would be nice to have flag added like: selenium.httpclient.version=HTTP_1_1

HTTP2 is by default according to docs

[HttpClient.Builder] version
Requests a specific HTTP protocol version where possible.
If this method is not invoked prior to building, then newly built clients will prefer [HTTP/2].

It seems that this problem with jdkhttpclient HTTP/2, Nginx GOAWAY is widely known and you need to handle it or use HTTP/1.1 (as described in the source reddit, but also in other places).

diemol commented 1 month ago

Maybe a system property is not a bad idea.

VietND96 commented 1 month ago

As my understanding, this case has the ingress proxy involves


The error

Caused by: java.net.http.WebSocketHandshakeException ... Caused by: jdk.internal.net.http.websocket.CheckFailedException: Unexpected HTTP response status code 308

and you mentioned that

By just removing TLS host for selenium-grid-ingress. After that change I was able to initialize driver with HTTP_1_1, and tests pass.`

I think this relates to Server-side HTTPS enforcement through redirect - By default the controller redirects (308) to HTTPS if TLS is enabled for that ingress. It means the TLS host for selenium-grid-ingress only set when the grid server back-end is enabled HTTPS, otherwise the redirect will be incorrect. In helm chart README, I also tried to input something for secure connection in K8s, I think if we configure the proxy correctly, it will work

In ingress-nginx docs, they also mentioned Proxy HTTP version that the Nginx reverse proxy will use to communicate with the backend. By default this is set to "1.1". So without redirect issue, HTTP version doesn't matter

kakliniew commented 1 month ago

@VietND96 thanks for your answer. I wanted to set it up, but faced some problems. I will try again and let you know

kakliniew commented 1 month ago

@VietND96 I configured TLS. Hub/Nodes are up and running. I used external secret. During test execution I received once again GOAWAY and there is a message in edge node's logs:

16:16:53.229 WARN [ProxyNodeWebsockets$ForwardingListener.onError] - Error proxying websocket commandjava.net.SocketException: Connection reset 
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(Unknown Source)  
at java.base/sun.nio.ch.SocketChannelImpl.read(Unknown Source)  
at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(Unknown Source) 
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(Unknown Source)   
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(Unknown Source)    
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(Unknown Source)   
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(Unknown Source) 
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(Unknown Source) 
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(Unknown Source) 
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(Unknown Source)   
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(Unknown Source)    
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(Unknown Source)   
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(Unknown Source)  
at java.base/java.util.ArrayList.forEach(Unknown Source)    
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(Unknown Source)

Initialized driver with HTTP1_1 works fine.

VietND96 commented 1 month ago

Yes, initialized the driver with HTTP1_1 in client config as I know it was supported via https://github.com/SeleniumHQ/selenium/pull/12919

Besides that, I also tried to find some interesting info for your reference.

In OpenJDK docs Introduction to the Java HTTP Client, I saw these statements

By default the client will send requests using HTTP/2. Requests sent to servers that do not yet support HTTP/2 will automatically be downgraded to HTTP/1.1

As I saw above, a comment also mentioned that the grid server side is not using HTTP/2, and suppose the client side is sending HTTP/2. That statement describes the correct behavior it should be.

I also found https://stackoverflow.com/questions/67018337/is-there-any-way-to-handle-http-2-goaway-received-ioexception-in-httpclient-java Here, I saw the workaround set http version via client was mentioned. However, also have a point mentioned that I can understand something like GOAWAY is an IOException thrown by client HTTP/2, and when switching to client HTTP/1.1, it could go to a situation that would continue to get "random" IOExceptions with a different message from the server using HTTP 1.1

Next, I found this https://stackoverflow.com/questions/55087292/how-to-handle-http-2-goaway-with-java-net-httpclient with these statements

I tried to reproduce this problem with a local nginx server with the default HTTP/2 configuration ... And after approximately 1000 requests, I get a GOAWAY error as expected:

I feel this is close to your setup, ingress nginx involves. I tried to search config key http2_max_requests and saw this docs. In recent versions of nginx, it changed to keepalive_requests with default is 1000. If ingress nginx deployed via helm chart, you can refer that chart values, config key controller.config with list objects can be set. So, can you try this also?

kakliniew commented 1 month ago

I tried setting these limits (keepalive_requests) for the local cluster (minikube) and started my local tests, but it didn't help - I couldn't reproduce the error. I checked the nginx-controller logs locally and in my ‘embedded’ cluster today. The logs show that locally the communication is using http1.1 (even when I try to initialise the driver with version 2.0). On the other hand, in the ‘embedded’ cluster the communication is with 2.0 (therefore it makes sense that this limit is reached and the goaway is thrown).

 "POST /session/<session_id>/element  HTTP/2.0" 200 28 "-" "selenium/4.18.1 (java windows)"  <- <Embedded> 
vs
 "POST /session/<session_id>/element  HTTP/1.1" 200 126 "-" "selenium/4.18.1 (java windows)"  <- <local> 

I can't really modify the nginx configuration in the ‘embedded cluster’, because it handles not only selenium grid and I can cause problems. In my local cluster I have not configured http2 for nginx, and its by default off, hence the problem in reproducing.

HttpClient.Builder version​(HttpClient.Version version)
If this method is not invoked prior to building, then newly built clients will prefer HTTP/2.

JdkHttpClient by default sets http2, hence my problems when I don't explicitly specify it to use http1.1. Therefore system property would be a good solution. selenium.httpclient.version=HTTP_1_1

kakliniew commented 1 month ago

@VietND96 today I accidentally discovered that if I use the url for selenium grid, but protocol http:// instead of https://, the communication is via http1.1 and GOAWAY does not occur.

This makes a lot of sense, but I didn't think of it before.

VietND96 commented 1 month ago

Not sure my understanding is correct, based on your setup mentioned before with HTTPS (it requires TLS version as well as HTTP version, which is based on the ingress controller). Probably, your setup exposes http:// with HTTP/1.1 and https:// with TLS+HTTP/2 With ingress use HTTP/2, I think these configs upstream-keepalive-... here needs to be consider. For example upstream-keepalive-connections - When this number is exceeded, the least recently used connections are closed. default: 320. With this description, it seems the message GOAWAY close to this.

JDKHttpClient also mentioned something if H2 (needs to try and see how it helps)

The JDK 11 ConnectionPool checks the jdk.httpclient.keepalive.timeout system property when initializing and defaults to 1200 seconds (20 minutes). Since 20 minutes is a long time in modern architecture, JDK20, build 26 reduces the default to 30 seconds. It also allow setting keepalive system property via a JVM argument e.g -Djdk.httpclient.keepalive.timeout=2 Since JDK20, build 25, some additional system properties were introduced specifically for H2 pools: jdk.httpclient.keepalivetimeout.h2 – set this property to control the keepalive timeout for H2 connections jdk.httpclient.maxstreams – set this property to control the maximum number of H2 streams that are permitted per HTTP connection (defaults to 100).

kakliniew commented 1 month ago

At this point, setting an use-http2 to "false" looks like a good solution. I did it today and the tests work in the default config (without forcing in the code http.version http_1_1 and using https://). Now HTTP/1.1 is used for http:// and https://. Previously for https:// HTTP/2 and TLS was the default (that's how it works in nginx ingress) and in combination with JdkHttpClient caused problems. I don't know exactly what limit was being exceeded, but that was the cause. If it helps you, I can try to configure HTTP/2 + TLS locally and test.

Thank you so much for all your replies, they have been very helpful and motivating, as at one point I was losing hope a bit!

github-actions[bot] commented 1 week ago

This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.