aerokube / ggr

A lightweight load balancer used to create big Selenium clusters
https://aerokube.com/ggr/latest/
Apache License 2.0
314 stars 74 forks source link

Issue with selenium-java 4.14.0+ and the new jdk httpclient #386

Closed benzman81 closed 9 months ago

benzman81 commented 9 months ago

Hi, we use your great tool ggr and selenoid in a grid. Now we switched from selenium-java 4.8.2 to 4.15.0 and get a lot of these errors: org.openqa.selenium.remote.UnreachableBrowserException: Error communicating with the remote browser. It may have died. with the cause:

Caused by: 
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:634)
at org.openqa.selenium.remote.RemoteWebDriver.executeAsyncScript(RemoteWebDriver.java:480)
at OurClass
... 21 more
Caused by: java.io.UncheckedIOException: java.io.IOException: HTTP/1.1 header parser received no bytes
at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:450)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:366)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:188)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:607)
... 23 more
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:327)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:673)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:297)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:263)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
... 3 more
Caused by: java.io.IOException: connection closed locally
at java.net.http/jdk.internal.net.http.SocketTube.signalClosed(SocketTube.java:157)
at java.net.http/jdk.internal.net.http.PlainHttpConnection.close(PlainHttpConnection.java:277)
at java.net.http/jdk.internal.net.http.Http1Exchange.cancelImpl(Http1Exchange.java:490)
at java.net.http/jdk.internal.net.http.Http1Exchange.cancel(Http1Exchange.java:426)
at java.net.http/jdk.internal.net.http.Exchange.cancel(Exchange.java:228)
at java.net.http/jdk.internal.net.http.MultiExchange.cancel(MultiExchange.java:240)
at java.net.http/jdk.internal.net.http.ResponseTimerEvent.handle(ResponseTimerEvent.java:71)
at java.net.http/jdk.internal.net.http.HttpClientImpl.purgeTimeoutsAndReturnNextDeadline(HttpClientImpl.java:1248)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:877)

The log of our GGR shows:

2023/11/23 15:42:05 [6840246] [-] [PROXY_ERROR] [test] [10.42.3.0] [http://ecmrdgrid02.de.cenit-group.com:4444/wd/hub/session/9e8960f15342ec02db3ca6b12eed685d/elements] [-] [-] [-] [context canceled]
2023/11/23 15:42:05 [6840244] [-] [PROXY_ERROR] [test] [10.42.3.0] [http://ecmrdgrid02.de.cenit-group.com:4444/wd/hub/session/a4e1c9311e17674774c5337125cfb107/execute/async] [-] [-] [-] [context canceled]
2023/11/23 15:42:06 [6840276] [-] [SESSION_DELETED] [-] [10.42.3.0] [-] [ecmrdgrid02.de.cenit-group.com:4444] [a4e1c9311e17674774c5337125cfb107] [-] [-]

And on the VMs running Selenoid:

2023/11/23 15:42:05 [60323] [CLIENT_DISCONNECTED] [test] [10.42.3.0, 10.0.44.36] [Error: context canceled]
2023/11/23 15:42:05 [60322] [CLIENT_DISCONNECTED] [test] [10.42.3.0, 10.0.44.36] [Error: context canceled]
2023/11/23 15:42:06 [60337] [SESSION_DELETED] [a4e1c9311e17674774c5337125cfb107]

The tests work for up to selenium-java 4.13.0 where they are still using netty as a http client. The issue also occurs in 4.13.0 when you switch to the new jdk http client via system property. See here: https://www.selenium.dev/blog/2022/using-java11-httpclient/

I already looked at their code, and it seems to have something to do with readTimeout. We set here 10 minutes, which was fine before and was only supposed to be a timeout for waiting in the queue of the grid. At least, thats what we document in out git commit ;-) We set the timeout like this:

ClientConfig config = ClientConfig.defaultConfig().baseUrl(new URL(getHubURL())).readTimeout(Duration.ofMinutes(10));
return new RemoteWebDriver(new HttpCommandExecutor(config), capabilities);

The error does not always happen, but aber 50 times in about 6000 parallel tests. I have one test, where I can reproduce it in 2 out of four times. Thats how I tracked it down to this readTimeout and to ggr. I suspect ggr and opened the bug here, because if we use the selenoid url directly, which is usally behind ggr, then the issue does not occur anymore.

Sadly, I could not strip down some sample code.

Any idea?

benzman81 commented 9 months ago

oh, forgot. We use aerokube/ggr:1.7.1 and aerokube/selenoid:1.11.0.

aandryashin commented 9 months ago

Hello, the message client disconnected means that some timeout occurs on client side, also timeout may occur on some balancer or nginx behind ggr. Please check and increase timeouts. Usually in big clusters there are many nginx servers are placed behind ggr and misconfiguration can be on one of them.

Alexander Andryashin.

чт, 23 нояб. 2023 г., 19:25 Markus Krüger @.***>:

oh, forgot. We use aerokube/ggr:1.7.1 and aerokube/selenoid:1.11.0.

— Reply to this email directly, view it on GitHub https://github.com/aerokube/ggr/issues/386#issuecomment-1824690884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKY23JRUXSRLOHSVSXW2P3YF52F7AVCNFSM6AAAAAA7YALKPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRUGY4TAOBYGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

benzman81 commented 9 months ago

@aandryashin there are no load balancers or so involved. I increased the timeout now and we will see.

benzman81 commented 9 months ago

Ok, so now we dont have the issue anymore, but we needed to set the readTimeout to 60 minutes for our tests. Without the new jdk httpclient or without using ggr a readTimeout of 10 minutes was sufficient.