SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.81k stars 8.21k forks source link

java.util.concurrent.TimeoutException thrown at random netty read timeouts with RemoteWebDriver #9528

Closed rcesarlumis closed 2 years ago

rcesarlumis commented 3 years ago

🐛 Bug Report

Netty at random times gets a read timeout at. This happens at different selenium commands ( for example: WebDriver.switchTo().defaultContent, WebElement.click, WebDriver.switchTo().window, WebElement.sendKeys, WebDriver.get, Alert.accept ) and at random in a quite small percentage chance (<1% test cases).

To Reproduce

I don't have specific steps to reproduce. When our CI runs our test suite of thousands of tests run, about 10 fails at random due to this timeout. I could not reproduce by doing a simple long loop with a few commands on my development workstation.

Timeout details

This timeout always occurs at:

Caused by: java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)

I could confirm that it took 3 minutes there, confirming that it is due to the default 3 minutes read timeout the selenium configures the netty with. But the commands that are timing outs would normally run very fast, much less than one second.

Trying the code below in a method called probably thousands times by my test suite, it failed entering the catch. But after it called again driver.switchTo().defaultContent() at the end of the code below it worked. So it seems that although the read timeout happens in netty, it still works normally afterwards.

try
{
driver.switchTo().defaultContent();
}
catch (TimeoutException e)
{
// this should never happen, but started happening at random after updating to selenium 4
// output information to help troubleshoot
System.err.println("TimeoutException thrown while trying to go to defaultContent (stack below). Trying again...");
e.printStackTrace();

try
{
Thread.sleep(5000);
}
catch (InterruptedException e1)
{
}

driver.switchTo().defaultContent();
}

In this case, the stack trace got by the e.printStackTrace() above was:

org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.0.0-beta-3', revision: '5d108f9a67'
System info: host: '51e5404d333b', ip: '172.18.0.7', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.1'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [a5e3bf25-ba72-4023-b219-76406cf58660, switchToFrame {id=null}]
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 88.0, javascriptEnabled: true, moz:accessibilityChecks: false, moz:buildID: 20210415204500, moz:debuggerAddress: localhost:46562, moz:geckodriverVersion: 0.29.0, moz:headless: false, moz:processID: 9286, moz:profile: /tmp/rust_mozprofileQJRwQP, moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, platformVersion: 3.10.0-1127.19.1.el7.x86_64, rotatable: false, se:cdp: ws://172.18.0.3:4444/sessio..., se:cdpVersion: 85, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: a5e3bf25-ba72-4023-b219-76406cf58660
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:103)
    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:181)
    at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:39)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:619)
    at org.openqa.selenium.remote.RemoteWebDriver$RemoteTargetLocator.defaultContent(RemoteWebDriver.java:1097)
    (...)
Caused by: java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
    ... 38 more

Environment

OS: Docker containers inside a CentOS Browser: RemoteWebDriver using Firefox in selenium/standalone-firefox:4.0.0-beta-3-20210426 docker image. Also tried the selenium/standalone-firefox:4.0.0-beta-4-prerelease-20210527 docker image, but the same thing happened. Browser Driver version: RemoteWebDriver from selenium-java 4.0.0-beta-3 Language Bindings version: Java 4.0.0-beta-3 The RemoteWebDriver runs in a container that is running in the same docker host as the browser container. So all network between them is only logical in the same machine. Previously we were using Selenium 2.52, in the same docker host, and never happened anything similar to such timeout.

Do you have any tips about what I can try to fix it or investigate more about this?

diemol commented 3 years ago

It is very likely that old Selenium versions had a (much) longer timeout. You can configure the timeout if you use the RemoteWebDriverBuilder class.

It can also be something related to the browser... Were you using the same browser version and browser driver version in old Selenium versions?

All in all, we need help to reproduce this... You can have a look at the Node logs, enable more verbose logging in GeckoDriver.

rcesarlumis commented 3 years ago

I don't think it worked before because of old Selenium had a longer timeout. Because the timeout is happening in commands that executes in a fraction of second when it does not happen. And it happens for commands that have no justification to take long, it should just succeed or return an error immediately, as for example Alert.accept() or WebDriver.switchTo().defaultContent(). But if the old selenium had some type of auto-retry on timeout, than it could be, because as detailed in the description, when I retried a timed out command it executed fine. But if I don't have any progress I will make a test increasing the timeout just to be sure.

In the old selenium the browser was of the docker image selenium/standalone-firefox-debug:2.52.0, and in the new selenium I tried the docker images selenium/standalone-firefox:4.0.0-beta-3-20210426 and selenium/standalone-firefox:4.0.0-beta-4-prerelease-20210527. So the browser versions and drivers are very different.

I will try to find any relevant log in the node logs and return with it if I find.

rcesarlumis commented 3 years ago

I found the following log in the browser container log that seems related to one of the java.util.concurrent.TimeoutException I get at the selenium RemoteWebDriver:

2021-05-28T04:27:49.028002225Z 04:27:49.027 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: java.util.concurrent.TimeoutException
2021-05-28T04:27:49.028038633Z Build info: version: '4.0.0-beta-4', revision: 'a51085a604'
2021-05-28T04:27:49.028045075Z System info: host: 'dc6d38043afe', ip: '172.18.0.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.11'
2021-05-28T04:27:49.028054734Z Driver info: driver.version: unknown
2021-05-28T04:27:49.028064151Z org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
2021-05-28T04:27:49.028068670Z Build info: version: '4.0.0-beta-4', revision: 'a51085a604'
2021-05-28T04:27:49.028078839Z System info: host: 'dc6d38043afe', ip: '172.18.0.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.11'
2021-05-28T04:27:49.028088096Z Driver info: driver.version: unknown
2021-05-28T04:27:49.028092825Z       at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)
2021-05-28T04:27:49.028097474Z       at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
2021-05-28T04:27:49.028101742Z       at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
2021-05-28T04:27:49.028105689Z       at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
2021-05-28T04:27:49.028110047Z       at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
2021-05-28T04:27:49.028123262Z       at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
2021-05-28T04:27:49.028128762Z       at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:105)
2021-05-28T04:27:49.028133181Z       at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
2021-05-28T04:27:49.028138801Z       at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:100)
2021-05-28T04:27:49.028143791Z       at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
2021-05-28T04:27:49.028148369Z       at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:123)
2021-05-28T04:27:49.028153389Z       at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:399)
2021-05-28T04:27:49.028157747Z       at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
2021-05-28T04:27:49.028171703Z       at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
2021-05-28T04:27:49.028176792Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028182454Z       at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
2021-05-28T04:27:49.028186993Z       at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
2021-05-28T04:27:49.028191241Z       at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
2021-05-28T04:27:49.028215927Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028220085Z       at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
2021-05-28T04:27:49.028224713Z       at org.openqa.selenium.grid.web.CombinedHandler.execute(CombinedHandler.java:59)
2021-05-28T04:27:49.028229843Z       at org.openqa.selenium.grid.web.RoutableHttpClientFactory$1.execute(RoutableHttpClientFactory.java:66)
2021-05-28T04:27:49.028234722Z       at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:100)
2021-05-28T04:27:49.028239241Z       at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)
2021-05-28T04:27:49.028244571Z       at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
2021-05-28T04:27:49.028249209Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028253978Z       at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
2021-05-28T04:27:49.028258407Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028262795Z       at org.openqa.selenium.grid.router.Router.execute(Router.java:91)
2021-05-28T04:27:49.028267664Z       at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
2021-05-28T04:27:49.028272293Z       at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
2021-05-28T04:27:49.028277102Z       at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
2021-05-28T04:27:49.028281810Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028286459Z       at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)
2021-05-28T04:27:49.028291108Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028295416Z       at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
2021-05-28T04:27:49.028299494Z       at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
2021-05-28T04:27:49.028303842Z       at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
2021-05-28T04:27:49.028316726Z       at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
2021-05-28T04:27:49.028322737Z       at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
2021-05-28T04:27:49.028332185Z       at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
2021-05-28T04:27:49.028337805Z       at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
2021-05-28T04:27:49.028342093Z       at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
2021-05-28T04:27:49.028346131Z       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
2021-05-28T04:27:49.028350820Z       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2021-05-28T04:27:49.028355599Z       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2021-05-28T04:27:49.028360658Z       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2021-05-28T04:27:49.028365998Z       at java.base/java.lang.Thread.run(Thread.java:829)
2021-05-28T04:27:49.028369967Z Caused by: java.util.concurrent.TimeoutException
2021-05-28T04:27:49.028373874Z       at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
2021-05-28T04:27:49.028377852Z       at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
2021-05-28T04:27:49.028381719Z       at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
2021-05-28T04:27:49.028390024Z       at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
2021-05-28T04:27:49.028399482Z       ... 47 more
2021-05-28T04:27:49.028448714Z 04:27:49.027 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "29cfa26fac6fff967616ca4945924213","eventTime": 1622176069026085041,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: java.util.concurrent.TimeoutException\nBuild info: version: '4.0.0-beta-4', revision: 'a51085a604'\nSystem info: host: 'dc6d38043afe', ip: '172.18.0.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.11'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.0.0-beta-4', revision: 'a51085a604'\nSystem info: host: 'dc6d38043afe', ip: '172.18.0.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.11'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:105)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:100)\n\tat org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:123)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:399)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.grid.web.CombinedHandler.execute(CombinedHandler.java:59)\n\tat org.openqa.selenium.grid.web.RoutableHttpClientFactory$1.execute(RoutableHttpClientFactory.java:66)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:100)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)\n\t... 47 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "selenium:4444","http.method": "POST","http.request_content_length": "16","http.scheme": "HTTP","http.target": "\u002fsession\u002f5f7b6d0b-fa5a-4c19-9a64-f521764b4b6c\u002fframe","http.user_agent": "selenium\u002f4.0.0-beta-3 (java unix)"}}
2021-05-28T04:27:49.028504950Z
2021-05-28T04:27:49.029534186Z 04:27:49.029 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "29cfa26fac6fff967616ca4945924213","eventTime": 1622176069028013648,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.0.0-beta-4', revision: 'a51085a604'\nSystem info: host: 'dc6d38043afe', ip: '172.18.0.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.11'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.0.0-beta-4', revision: 'a51085a604'\nSystem info: host: 'dc6d38043afe', ip: '172.18.0.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.11'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:105)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:100)\n\tat org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:123)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:399)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.grid.web.CombinedHandler.execute(CombinedHandler.java:59)\n\tat org.openqa.selenium.grid.web.RoutableHttpClientFactory$1.execute(RoutableHttpClientFactory.java:66)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:100)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)\n\t... 47 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "selenium:4444","http.method": "POST","http.request_content_length": "16","http.scheme": "HTTP","http.target": "\u002fsession\u002f5f7b6d0b-fa5a-4c19-9a64-f521764b4b6c\u002fframe","http.user_agent": "selenium\u002f4.0.0-beta-3 (java unix)","session.id": "5f7b6d0b-fa5a-4c19-9a64-f521764b4b6c"}}

There is also a message [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt repeating constantly multiple times per second, but I suppose it may not be related, because it also is written when there is no problem happening.

rcesarlumis commented 3 years ago

I tried executing with trace log, but there were no other extra entries for the timed out request, only for the requests executed successfully before and after it. I'm attaching in this comment a part of the log showing it. This part of log the corresponds to the code in the description, where it tried to switch to defaultContent, it threw TimeoutException after 3 minutes, the code waited 5 seconds and tried again to switch to defaultContent and it worked in less than 20 milliseconds. selenium.log

diemol commented 3 years ago

I understand what you mean, it is not clear why the timeout happens, but what the Grid is doing is simply relaying the command to GeckoDriver.

The log shows that the /frame command is never logged by GeckoDriver and then the timeout is caused, you retry and it is seen in the log the /frame command. So I don't have an explanation why this is happening with GeckoDriver.

I would try to run the same tests with Chrome or Edge and see what happens, to understand if the issue is GeckoDriver, something with the tests or something with the Grid.

Also, what type of load does the machine have when this happens? How many tests are executed in parallel? Does this stop happening when you run them sequentially? It is much likely that new versions of Firefox need more resources, plus there is a new process in the middle (GeckoDriver).

rcesarlumis commented 3 years ago

The tests that uses selenium are run sequential, but there were other tests in other containers running simultaneously. So I tried to run the selenium tests alone to make sure the others weren't affecting the performance, but the result was the same. I tried running the same tests using the image standalone-chrome instead of the standalone-firefox, but the same thing happened.

diemol commented 3 years ago

Ok, I understand. At this point I am out of ideas for things to suggest. I think we would need some sort of way to reproduce the issue, otherwise this will only turn into a conversation which we can have in the Slack channel.

rcesarlumis commented 3 years ago

I have an information that might be useful. Using the docker image selenium/standalone-firefox:3.141.59-20210422 the timeouts did not happen. I only changed the image being used for the browser container, the test code still used the RemoteWebDriver in the selenium-java:4.0.0-beta-3.

diemol commented 3 years ago

I see, we would appreciate a test that can be used to reproduce the issue, even if that means we need to run the test 100 times until we are able to reproduce it. Maybe you can set up a GitHub repo that helps us to see all dependencies and commands for us to use that and work on it.

rcesarlumis commented 3 years ago

I was able to reproduce with the following code:

WebDriver remoteWebDriver = null; // TODO create a RemoteWebDriver that uses docker container

Dimension defaultWindowSize = remoteWebDriver.manage().window().getSize();
for (int i = 0; i < 2000; i++)
{
    System.out.println(i);

    remoteWebDriver.get("http://192.168.99.1:8080/portal/test.html");
    assertEquals("Test", remoteWebDriver.getTitle());

    // uncommenting these lines below sometimes causes: org.openqa.selenium.WebDriverException: null value in entry: message=null
    // remoteWebDriver.findElement(By.tagName("input")).sendKeys("some text");
    // assertEquals("some text", ((JavascriptExecutor)remoteWebDriver).executeScript("return document.getElementById('name').value"));

    ((JavascriptExecutor) remoteWebDriver).executeScript("return document.getElementById('name').value");
    ((JavascriptExecutor) remoteWebDriver).executeScript("alert(1)");
    remoteWebDriver.switchTo().alert().accept();
    remoteWebDriver.switchTo().defaultContent();

    remoteWebDriver.navigate().refresh();

    remoteWebDriver.switchTo().window(remoteWebDriver.getWindowHandles().iterator().next());

    remoteWebDriver.manage().deleteAllCookies();

    remoteWebDriver.navigate().to("about:blank");

    remoteWebDriver.manage().window().setSize(defaultWindowSize);
}

The test.html code can be found here: https://jsfiddle.net/vmtpf35o/

This causes the TimeoutException copied in the attached file stack1.txt.

If the 2 commented lines are uncommented, sometimes it fails with the TimeoutException of stack1.txt, sometimes it fails with a org.openqa.selenium.WebDriverException: null value in entry: message=null as copied in stack2.txt.

In my test suite I found these random stack2 alike errors, and they also stopped happening when I switched to the 3.141.59-20210422 container. I don't know if it is somehow related to the TimeoutException.

KevinLinSL commented 3 years ago

Hi @diemol , the get_window_handles problem we have been discussing actually has the same stack trace as this issue: It's a TimeoutException that happens in the Response response = whenResponse.get, which leads me to believe the whenResponse is at fault, since .get() is just polling the Future response.

The response calls NettyMessages.toNettyRequest, and in that method we see special treatment for POSTs, which usually work for me, but not GETs, which fail.

GETs not having the proper info passed could be the source of "random" failures for @rcesarlumis 's issue, and maybe why all these GET related requests are reported to hang: https://github.com/teodesian/Selenium-Remote-Driver/issues/452#issuecomment-763072448

The git blame there also mentions that it's a draft, so maybe GET was supposed to be added, but forgotten?

image

diemol commented 3 years ago

@rcesarlumis, is there something special I need to set up to reproduce the issue?

I started a docker container, standalone, like this:

docker run -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome:4.0.0-beta-4-20210608

and then I used your code to create this maven project, which I have executed 5 times and I have not bumped into the issue. The only difference is the Grid version, beta-4 in this case. I believe I am missing something.

EDIT: I have been trying with Chrome, I will try with Firefox and report back. (But I believe the browser made no difference).

rcesarlumis commented 3 years ago

I tried now running your project and using the same docker run command you stated in your comment and it reproduced. I ran three times, it reproduced when i was 485, 509 and 847. The only thing I changed on the project was adding a URL to the RemoteWebDriver constructor and changing the ip on remoteWebDriver.get, because my docker is not running at localhost. But I don't think this would affect the result. I also used the python 3 http server as commented in your project code, only added -b 0.0.0.0 so it is binded to all my ips, because my docker is not at localhost. I suppose this may be a race condition that may relate to CPU speed. Don't know if this helps, but my CPU is 4 core, 8 logical, uses around 35-50% during its execution at around 4Ghz. Other thing I can think of is if it is memory related. Maybe you can try adding -e "JAVA_OPTS=-Xmx512m" --memory="2000m" --memory-swap="2000m" to the docker run if it currently has access to a lot of memory, to see if anything changes.

diemol commented 3 years ago

What is your host OS? Are these the resources Docker has in your setup?

-e "JAVA_OPTS=-Xmx512m" --memory="2000m" --memory-swap="2000m"
diemol commented 3 years ago

I completely missed the options above where Docker options, trying again.

rcesarlumis commented 3 years ago

My local OS, where I ran your project, is Windows 10 running Docker Toolbox (https://docs.docker.com/toolbox/), which runs the docker in a virtualbox machine running its default boot2docker Linux. That virtualbox machine is configured with 4GB RAM.

It reproduced using and not using these parameters-e "JAVA_OPTS=-Xmx512m" --memory="2000m" --memory-swap="2000m". My idea of suggesting them is that if you do not set them, the default will probably try to adjust according to your maximum memory. So if you have a lot of memory available, these parameters would apply some memory pressure that would look more like mine.

My full CI test suite that also was getting the timeout runs in CentOS 7.8.

pujagani commented 3 years ago

Thank you for sharing the details and providing a very quick response/feedback for this issue. Based on the code snippet shared, I was able to reproduce the issue using the script below:

public class MainHttpTimeout {

  public static void main(String[] args) throws MalformedURLException, InterruptedException {

    EdgeOptions options = new EdgeOptions();

    WebDriver remoteWebDriver = new RemoteWebDriver(new URL("http://localhost:4444"), options);

    for (int i = 0; i < 2000; i++)
    {
      System.out.println(i);

      remoteWebDriver.get("http://www.google.com");

      remoteWebDriver.getWindowHandle();
      remoteWebDriver.getCurrentUrl();
      remoteWebDriver.getTitle();
      remoteWebDriver.manage().timeouts().getScriptTimeout();
    }

    remoteWebDriver.quit();
  }

}

I set up the Grid in standalone mode using Beta-4 jar. I was able to see the timeout. After digging for a while, I suspected the issue was lying in the code that creates the single instance of AsyncHttpClient. The AsyncHttpClient uses a Netty Timer. Due to issues raised by the users in the past relating to "Too many HashedWheelTimer instances", we created a single instance of timer and passed it. The exact issue was starting that timer in the static block. I think it's ideal to allow AsyncHttpClient to manage the timer lifecycle. I have made the fix as part of https://github.com/SeleniumHQ/selenium/commit/91e313a49862e18959b23b2136b3452aea636139#. The fix is currently in the trunk.

diemol commented 3 years ago

I was able to reproduce a few times with the code shared by @pujagani above. However, I noticed that I was able to reproduce it while my laptop was running low on resources (XCODE was being upgraded). So the resources constraint might be a reason for this to happen.

However, I will generate a jar in a few moments and then you all can grab it from a url I will share here, so we can get your feedback.

gitrust commented 3 years ago

I had a similiar issue running into a TimeoutException.

I used docker-based standalone-chrome image with one session on this node. Though by using standalone-firefox lead to same exceptions.

In my case I did not clean up created RemoteWebDriver instances properly (webDriver.close or webDriver.quit()). And so a session was still alive on the node (default timeout 300 secs). Which in turn lead to timeouts using methods of next created RemoteWebDriver instance.

diemol commented 3 years ago

@rcesarlumis could you please try with the most recent pre-release? We are not able to reproduce it anymore after making a couple of fixes, so we'd like to get your feedback https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.0.0-rc-1-prerelease-20210713

rcesarlumis commented 3 years ago

@diemol Hi, I have a feeling it is more rare now, requiring more iterations to reproduce, but it is still happening. Maybe you fixed something that caused it or contributed to it, but there are still other things that causes it.

This is the error on the selenium client - client.log (it happened on the remoteWebDriver.navigate().to("about:blank"); right after the remoteWebDriver.manage().deleteAllCookies();).

And this is the selenium server docker log with trace enabled - docker.log. The trace shows the call to delete cookies, but does not show the call to navigate().to("about:blank") after that, where the timeout happened.

I used the docker image selenium/standalone-firefox:4.0.0-rc-1-prerelease-20210713.

Wadyasafslsa commented 3 years ago

🐛 تقرير الشوائب

يحصل Netty في أوقات عشوائية على مهلة قراءة في. يحدث هذا في أوامر سيلينيوم مختلفة (على سبيل المثال: WebDriver.switchTo (). defaultContent ، WebElement.click ، ​​WebDriver.switchTo (). window ، WebElement.sendKeys ، WebDriver.get ، Alert.accept) وبشكل عشوائي بنسبة صغيرة جدًا فرصة (<1٪ حالات اختبار).

لإعادة إنتاج

ليس لدي خطوات محددة لإعادة الإنتاج. عندما يقوم CI بتشغيل مجموعة الاختبار الخاصة بنا المكونة من آلاف الاختبارات ، يفشل حوالي 10 بشكل عشوائي بسبب هذه المهلة. لم أستطع التكاثر بعمل حلقة طويلة بسيطة مع بعض الأوامر على محطة عمل التطوير الخاصة بي.

تفاصيل المهلة

تحدث هذه المهلة دائمًا في:

Caused by: java.util.concurrent.TimeoutException
  at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
  at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
  at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
  at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)

يمكنني أن أؤكد أن الأمر استغرق 3 دقائق هناك ، مع التأكيد على أنه يرجع إلى مهلة قراءة 3 دقائق الافتراضية التي يكوّن السيلينيوم بها. لكن الأوامر التي تنتهي بالمهلة عادة ما تعمل بسرعة كبيرة ، أقل بكثير من ثانية واحدة.

بتجربة الكود أدناه بطريقة تسمى على الأرجح آلاف المرات بواسطة مجموعة الاختبار الخاصة بي ، فقد فشلت في الدخول إلى المصيد. ولكن بعد أن استدعت مرة أخرى driver.switchTo (). defaultContent () في نهاية الكود أدناه ، عملت. لذلك يبدو أنه على الرغم من أن مهلة القراءة تحدث بشكل بسيط ، إلا أنها لا تزال تعمل بشكل طبيعي بعد ذلك.

try
{
driver.switchTo().defaultContent();
}
catch (TimeoutException e)
{
// this should never happen, but started happening at random after updating to selenium 4
// output information to help troubleshoot
System.err.println("TimeoutException thrown while trying to go to defaultContent (stack below). Trying again...");
e.printStackTrace();

try
{
Thread.sleep(5000);
}
catch (InterruptedException e1)
{
}

driver.switchTo().defaultContent();
}

In this case, the stack trace got by the e.printStackTrace() above was:

org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.0.0-beta-3', revision: '5d108f9a67'
System info: host: '51e5404d333b', ip: '172.18.0.7', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.1'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [a5e3bf25-ba72-4023-b219-76406cf58660, switchToFrame {id=null}]
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 88.0, javascriptEnabled: true, moz:accessibilityChecks: false, moz:buildID: 20210415204500, moz:debuggerAddress: localhost:46562, moz:geckodriverVersion: 0.29.0, moz:headless: false, moz:processID: 9286, moz:profile: /tmp/rust_mozprofileQJRwQP, moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, platformVersion: 3.10.0-1127.19.1.el7.x86_64, rotatable: false, se:cdp: ws://172.18.0.3:4444/sessio..., se:cdpVersion: 85, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: a5e3bf25-ba72-4023-b219-76406cf58660
  at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)
  at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
  at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
  at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
  at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
  at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
  at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:103)
  at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
  at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:181)
  at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:39)
  at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:619)
  at org.openqa.selenium.remote.RemoteWebDriver$RemoteTargetLocator.defaultContent(RemoteWebDriver.java:1097)
  (...)
Caused by: java.util.concurrent.TimeoutException
  at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
  at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
  at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
  at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
  ... 38 more

Environment

OS: Docker containers inside a CentOS Browser: RemoteWebDriver using Firefox in selenium/standalone-firefox:4.0.0-beta-3-20210426 docker image. Also tried the selenium/standalone-firefox:4.0.0-beta-4-prerelease-20210527 docker image, but the same thing happened. Browser Driver version: RemoteWebDriver from selenium-java 4.0.0-beta-3 Language Bindings version: Java 4.0.0-beta-3 The RemoteWebDriver runs in a container that is running in the same docker host as the browser container. So all network between them is only logical in the same machine. Previously we were using Selenium 2.52, in the same docker host, and never happened anything similar to such timeout.

Do you have any tips about what I can try to fix it or investigate more about this?

JulienBreton commented 3 years ago

logs_docker_selenium.txt I have tested several times the code of @pujagani https://github.com/SeleniumHQ/selenium/issues/9528#issuecomment-868360126 and I regularly face the issue java.util.concurrent.TimeoutException I launched the test 10 times and 6 times it was KO (java.util.concurrent.TimeoutException).

Environnement docker-selenium : 4.0.0-rc-2-prerelease-20210916 selenium : Selenium 4.0.0 RC 1 (from maven repository). Firefox 92.0 / Chrome 93.0.4577.82

Steps to reproduce 1 - I put the code of @pujagani in a TestNG test. 2 - I launched the test with maven (mvn test) 3 - I launched the test 10 times to confirm the issue

Results Test 1 failed at i=811, Test 2 OK, Test 3 OK, Test 4 failed at i=826, Test 5 failed at i=716, Test 6 OK, Test 7 failed at i=359, Test 8 failed at i=1400, Test 9 OK, Test 10 failed at i=1549.

[ERROR] timeoutTest(julien.selenium.SeleniumTest)  Time elapsed: 352.736 s  <<< FAILURE!
org.openqa.selenium.TimeoutException: 
java.util.concurrent.TimeoutException
Build info: version: '4.0.0-rc-1', revision: 'bc5511cbda'
System info: host: 'julien-ThinkPad-W540', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-84-generic', java.version: '11.0.11'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [a2d2290d-8c05-4440-ae67-01a2b5dbbef5, get {url=http://www.google.com}]
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 92.0, javascriptEnabled: true, moz:accessibilityChecks: false, moz:buildID: 20210903235534, moz:debuggerAddress: localhost:40991, moz:geckodriverVersion: 0.29.1, moz:headless: false, moz:processID: 528, moz:profile: /tmp/rust_mozprofileqF9hcv, moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, platformVersion: 5.4.0-84-generic, proxy: Proxy(), se:cdp: ws://172.22.0.8:5555/sessio..., se:cdpVersion: 85, se:vnc: ws://172.22.0.8:5555/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.22.0.8:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: a2d2290d-8c05-4440-ae67-01a2b5dbbef5
    at julien.selenium.SeleniumTest.timeoutTest(SeleniumTest.java:59)
Caused by: java.util.concurrent.TimeoutException
    at julien.selenium.SeleniumTest.timeoutTest(SeleniumTest.java:59)

docker-compose.yaml

version: '2'
services:
    firefox:
        image: selenium/node-firefox:4.0.0-rc-2-prerelease-20210916
        shm_size: 2gb
        restart: on-failure
        volumes:
            - /dev/shm:/dev/shm
            - /home/julien/test:/home/seluser/upload
        depends_on:
            - selenium-hub
        environment:
            - SE_EVENT_BUS_HOST=selenium-hub
            - SE_EVENT_BUS_PUBLISH_PORT=4442
            - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
        ports:
            - 5900

    chrome:
        image: selenium/node-chrome:4.0.0-rc-2-prerelease-20210916
        shm_size: 2gb
        restart: on-failure
        volumes:
            - /dev/shm:/dev/shm
            - /home/julien/test:/home/seluser/upload
        depends_on:
            - selenium-hub
        environment:
            - SE_EVENT_BUS_HOST=selenium-hub
            - SE_EVENT_BUS_PUBLISH_PORT=4442
            - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
        ports:
            - 5900            

    edge:
        image: selenium/node-edge:4.0.0-rc-2-prerelease-20210916
        shm_size: 2gb
        depends_on:
            - selenium-hub
        environment:
            - SE_EVENT_BUS_HOST=selenium-hub
            - SE_EVENT_BUS_PUBLISH_PORT=4442
            - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
        ports:
            - 5900            

    selenium-hub:
        image: selenium/hub:4.0.0-rc-2-prerelease-20210916
        restart: on-failure
        ports:
            - "4442:4442"
            - "4443:4443"
            - "4444:4444"
rkkreddy commented 3 years ago

We are facing the same issue as well. Any update on this?

szamacz commented 3 years ago

Issue reproducible on stable selenium 4 version with selenium/standalone-chrome:94.0-chromedriver-94.0-grid-4.0.0-20211013 docker image.

diemol commented 3 years ago

Could you share the details on how to reproduce this with docker?

JulienBreton commented 3 years ago

@diemol you can use this repo to reproduce the bug : https://github.com/JulienBreton/selenium_4_bug_java.util.concurrent.TimeoutException

szamacz commented 3 years ago

@diemol In our case, it is only reproducible on docker when running tests on azure devops build agents. Locally (both on browser and docker image) I wasn't able to reproduce it unfortunately

evertones commented 3 years ago

I can see it every time I run the suite of tests that I have. Out of ~320 specs, usually there are random 32 or 33 specs that fail with the org.openqa.selenium.TimeoutException. Every time I run, different specs fail with this exception.

Environment
Logs on the Hub
04:35:07.240 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "b005219d41e121bd53f61caa9e9ded41","eventTime": 1634751307239903954,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException
Build info: version: '4.0.0', revision: '3a21814679'
System info: host: 'selenium-hub-7f774c84d4-v5rh5', ip: '10.38.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-80-generic', java.version: '11.0.11'
Driver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.0.0', revision: '3a21814679'
System info: host: 'selenium-hub-7f774c84d4-v5rh5', ip: '10.38.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-80-generic', java.version: '11.0.11'
Driver info: driver.version: unknown
     org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)
     org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
     org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
     org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
     org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
     org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
     org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:119)
     org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
     org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
     org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)
     org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
     org.openqa.selenium.remote.http.Route.execute(Route.java:68)
     org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
     org.openqa.selenium.remote.http.Route.execute(Route.java:68)
     org.openqa.selenium.grid.router.Router.execute(Router.java:91)
     org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
     org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
     org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
     org.openqa.selenium.remote.http.Route.execute(Route.java:68)
     org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)
     org.openqa.selenium.remote.http.Route.execute(Route.java:68)
     org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
     org.openqa.selenium.remote.http.Route.execute(Route.java:68)
     org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
     org.openqa.selenium.remote.http.Route.execute(Route.java:68)
     org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
     org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
     org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
     org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
     org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
     org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
     java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
     java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)
     java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
     java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
     java.base\u002fjava.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
     java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
     java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
     org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
     org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
     ... 35 more
","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "seleniumserver.mycompany:4444","http.method": "POST","http.request_content_length": "50","http.scheme": "HTTP","http.target": "\u002fsession\u002f6c1e5340d84ae80f95225328a382ceca\u002felement\u002fecc9bb9c-537a-4f18-ba55-1bffee18a02f\u002fclick","http.user_agent": "selenium\u002f4.0.0 (java unix)","session.id": "6c1e5340d84ae80f95225328a382ceca"}}
Logs on the Node
Starting ChromeDriver 94.0.4606.61 (418b78f5838ed0b1c69bb4e51ea0252171854915-refs/branch-heads/4606@{#1204}) on port 53852
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
[1634751054.077][SEVERE]: bind() failed: Cannot assign requested address (99)
04:30:54.478 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
04:31:10.602 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "feacaa143077624f51266c887176de8d","eventTime": 1634751070601890876,"eventName": "HTTP request execution complete","attributes": {"http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "seleniumserver.mycompany:4444","http.method": "POST","http.request_content_length": "48780","http.scheme": "HTTP","http.status_code": 404,"http.target": "\u002fsession\u002f39cbd903608b78739dfc34d4104913cb\u002fexecute\u002fsync","http.user_agent": "selenium\u002f4.0.0 (java unix)"}}

04:31:14.690 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "a9e66e1dac5fed0e440bf94a23083755","eventTime": 1634751074689648608,"eventName": "HTTP request execution complete","attributes": {"http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "seleniumserver.mycompany:4444","http.method": "GET","http.scheme": "HTTP","http.status_code": 404,"http.target": "\u002fsession\u002f39cbd903608b78739dfc34d4104913cb\u002felement\u002fa997d7b6-0ab5-4a32-874d-b35b14033edf\u002fenabled","http.user_agent": "selenium\u002f4.0.0 (java unix)"}}

04:31:24.777 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "3f383b3e195d52bbae075aec229d14e9","eventTime": 1634751084776014440,"eventName": "HTTP request execution complete","attributes": {"http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "seleniumserver.mycompany:4444","http.method": "GET","http.scheme": "HTTP","http.status_code": 404,"http.target": "\u002fsession\u002f39cbd903608b78739dfc34d4104913cb\u002felement\u002f3c73b7fa-7b1a-40c8-adad-549fa2871438\u002fenabled","http.user_agent": "selenium\u002f4.0.0 (java unix)"}}
mrkgoh commented 3 years ago

I am facing the same issue here.

Environment: Running selenium/standalone-chrome:latest via Synology DS220+ Python 3

The docker container is launched via Python SDK:

        container = client.containers.run('selenium/standalone-chrome',
                                          user = 'root',
                                          detach=True,
                                          auto_remove=True,
                                          environment = ["TZ=Asia/Kuala_Lumpur"],
                                          ports={'4444/tcp':('127.0.0.1',4444)},
                                          extra_hosts={
                                              'www.stockbroking.com.my': '211.24.18.135', #  source:https://stackoverflow.com/a/69732302/15339732
                                              'www.stockbroking.com.my': '211.24.18.7'
                                            },
                                          volumes = {'/dev/shm':{'bind':'/dev/shm','mode':'rw'},
                                                    '/volume1/docker/chrome/downloads/':{'bind':'/home/seluser/downloads/', 'mode':'rw'}},
                                         privileged = True)

The traceback is as follows

Traceback (most recent call last):
  File "/volume1/homes/admin/Drive/stock/bursa/bursaqr_announcer.py", line 385, in <module>
    main()
  File "/volume1/homes/admin/Drive/stock/bursa/bursaqr_announcer.py", line 379, in main
    BursaQR().application(sc, px, eg)
  File "/volume1/homes/admin/Drive/stock/bursa/bursaqr_announcer.py", line 338, in application
    sc.producer()
  File "/volume1/homes/admin/Drive/stock/bursa/bursaqr_announcer.py", line 73, in producer
    df = screener(
  File "/volume1/homes/admin/Drive/stock/filehandler/file_handler2.py", line 121, in wrapper
    df = func(*args, **kwargs)
  File "/volume1/homes/admin/Drive/stock/filter/screener.py", line 23, in screener
    acc, pri = getet()
  File "/volume1/homes/admin/Drive/stock/filter/get_ETfilter.py", line 45, in get_ETfilter
    driver.get("https://trade.stockbroking.com.my/stockbroking/research/researchvalidate.aspx?StkCode")
  File "/volume1/homes/admin/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/volume1/homes/admin/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/volume1/homes/admin/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Unable to execute request for an existing session: java.util.concurrent.TimeoutException
Build info: version: '4.0.0', revision: '3a21814679'
System info: host: '7a534966c65d', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.59+', java.version: '11.0.11'
Driver info: driver.version: unknown
diemol commented 3 years ago

Seems we could use this repo to reproduce the issue. https://github.com/kasperSiteimprove/GridTimeoutRepro

schurik commented 3 years ago

Hi, I am facing the same issue. My setup:

  1. Hub and Nodes setup. Hub with just one Node on the same maschine. selenium-server.jar (v4.0.0) java -jar selenium-server.jar hub --session-request-timeout 7000 java -jar selenium-server.jar node --max-sessions 1
  2. A selenium test which takes longer than 180 seconds to execute.

@Test public void test() throws MalformedURLException, InterruptedException {

    final WebDriver driver = new RemoteWebDriver(new URL("http://localhost:4444"), new EdgeOptions());
    final WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(30));
    driver.get("https://www.google.de");

    Thread.sleep(2_000);
    wait.until(ExpectedConditions.elementToBeClickable(By.id("L2AGLb"))).click();

    Thread.sleep(1_000);

    int repeatWaitingLoop = 20;
    int keepWaiting = 0;
    while (keepWaiting++ < repeatWaitingLoop) {
        final WebElement element = wait.until(ExpectedConditions.presenceOfElementLocated(By.name("q")));
        element.clear();
        element.sendKeys("do a barrel roll" + Keys.ENTER);
        Thread.sleep(10_000);
    }
    driver.close();
    driver.quit();
}
  1. Start the test two times. First get executed and second is queued.
  2. After 180 seconds the second tests (which is queued) just fails. See error log at the end
  3. As soon the first test is finisched the queued tests is picked up from the queue
  4. Browser starts and nothing happens becase the test is already gone
  5. Debug output of the second timedout test:

18:01:42.909 [Forwarding newSession on session null to remote] DEBUG io.netty.channel.DefaultChannelId - -Dio.netty.processId: 34760 (auto-detected) 18:01:42.925 [Forwarding newSession on session null to remote] DEBUG io.netty.util.NetUtil - -Djava.net.preferIPv4Stack: false 18:01:42.925 [Forwarding newSession on session null to remote] DEBUG io.netty.util.NetUtil - -Djava.net.preferIPv6Addresses: false 18:01:44.311 [Forwarding newSession on session null to remote] DEBUG io.netty.util.NetUtilInitializations - Loopback interface: lo (Software Loopback Interface 1, 127.0.0.1) 18:01:44.311 [Forwarding newSession on session null to remote] DEBUG io.netty.util.NetUtil - Failed to get SOMAXCONN from sysctl and file \proc\sys\net\core\somaxconn. Default: 200 18:01:45.679 [Forwarding newSession on session null to remote] DEBUG io.netty.channel.DefaultChannelId - -Dio.netty.machineId: 20:79:18:ff:fe:28:a6:7c (auto-detected) 18:01:45.834 [AsyncHttpClient-1-2] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkAccessible: true 18:01:45.835 [AsyncHttpClient-1-2] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkBounds: true 18:01:45.835 [AsyncHttpClient-1-2] DEBUG io.netty.util.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@aacca11 18:01:45.861 [AsyncHttpClient-1-2] DEBUG org.asynchttpclient.netty.channel.NettyConnectListener - Using new Channel '[id: 0x48cf3810, L:/127.0.0.1:4279 - R:localhost/127.0.0.1:4444]' for 'POST' to '/session' 18:01:45.909 [AsyncHttpClient-1-2] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacityPerThread: 4096 18:01:45.909 [AsyncHttpClient-1-2] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxSharedCapacityFactor: 2 18:01:45.909 [AsyncHttpClient-1-2] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.linkCapacity: 16 18:01:45.909 [AsyncHttpClient-1-2] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8 18:01:45.909 [AsyncHttpClient-1-2] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.delayedQueue.ratio: 8 18:04:42.862 [AsyncHttpClient-1-1] DEBUG org.asynchttpclient.netty.timeout.TimeoutTimerTask - Request timeout to localhost/127.0.0.1:4444 after 180000 ms for NettyResponseFuture{currentRetry=0, isDone=0, isCancelled=0, asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@6d797d3a, nettyRequest=org.asynchttpclient.netty.request.NettyRequest@2b8e1b13, future=java.util.concurrent.CompletableFuture@15c398ef[Not completed, 1 dependents], uri=http://localhost:4444/session, keepAlive=true, redirectCount=0, timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@7af15f38, inAuth=0, touch=1637082105973} after 180053 ms 18:04:42.862 [AsyncHttpClient-1-1] DEBUG org.asynchttpclient.netty.channel.ChannelManager - Closing Channel [id: 0x48cf3810, L:/127.0.0.1:4279 - R:localhost/127.0.0.1:4444] 18:04:42.862 [AsyncHttpClient-1-1] DEBUG org.asynchttpclient.netty.request.NettyRequestSender - Aborting Future NettyResponseFuture{currentRetry=0, isDone=0, isCancelled=0, asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@6d797d3a, nettyRequest=org.asynchttpclient.netty.request.NettyRequest@2b8e1b13, future=java.util.concurrent.CompletableFuture@15c398ef[Not completed, 1 dependents], uri=http://localhost:4444/session, keepAlive=true, redirectCount=0, timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@7af15f38, inAuth=0, touch=1637082105973}

18:04:42.862 [AsyncHttpClient-1-1] DEBUG org.asynchttpclient.netty.request.NettyRequestSender - Request timeout to localhost/127.0.0.1:4444 after 180000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:4444 after 180000 ms
    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
18:04:42.862 [AsyncHttpClient-1-1] DEBUG org.asynchttpclient.AsyncCompletionHandler - Request timeout to localhost/127.0.0.1:4444 after 180000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:4444 after 180000 ms
    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
18:04:42.862 [AsyncHttpClient-1-2] DEBUG org.asynchttpclient.netty.handler.HttpHandler - Channel Closed: [id: 0x48cf3810, L:/127.0.0.1:4279 ! R:localhost/127.0.0.1:4444] with attribute DISCARD

org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.0.0', revision: '3a21814679'
System info: host: 'K57244', ip: 'mymaschine', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.2'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [null, newSession {capabilities=[Capabilities {browserName: MicrosoftEdge, ms:edgeOptions: {args: [], extensions: []}}], desiredCapabilities=Capabilities {browserName: MicrosoftEdge, ms:edgeOptions: {args: [], extensions: []}}}]
Capabilities {}

    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:577)
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:246)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:168)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:146)
    at ch.sbb.zvs.testhub.testrunner.DemoTest.test(DemoTest.java:26)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
    at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
    at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
    at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
    at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
    at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:206)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:131)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
    at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
    at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
    at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:108)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:96)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:75)
    at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
    at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
Caused by: java.lang.RuntimeException: NettyHttpHandler request execution error
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:83)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:119)
    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:102)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
    at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:39)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:559)
    ... 69 more
Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:4444 after 180000 ms
    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
    ... 82 more
Caused by: java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:4444 after 180000 ms
    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
  1. Hope this helps to solve this issue

Regards Alex

webtoman commented 3 years ago

We also see the same thing randomly (but also quite often). One of our tests is a crawler which simply records this TimeoutException and keeps going. When it does go on, the session works just fine.

2021-11-19 18:43:37.223 CET: [urllib3.connectionpool] {DEBUG} http://<hostname>:4445 "POST /wd/hub/session/e64e6ad7e9ec2d7439d749ddb5de8849/element HTTP/1.1" 500 4313
...
selenium.common.exceptions.TimeoutException: Message: Unable to execute request for an existing session: java.util.concurrent.TimeoutException
...
 2021-11-19 18:43:37.334 CET: [urllib3.connectionpool] {DEBUG} http://<hostname>:4445 "GET /wd/hub/session/e64e6ad7e9ec2d7439d749ddb5de8849/screenshot HTTP/1.1" 200 176872

This never happens with selenium 3.141, only see it with '4.0.0', revision: '3a21814679'

amyreit commented 2 years ago

Just to add to the witness list - I also hit this very intermittently using the selenium/standalone-firefox:4.0.0-20211102 on a driver.switchTo().defaultContent() call.

JulienBreton commented 2 years ago

Do you know if there is a workaround for this issue ? Selenium 4 mades the tests flaky, we can't use it.

phoenix384 commented 2 years ago

Yeah, this happens way too often on click, defaultContent, addCookie, ... calls to seriously use v4.

rcesarlumis commented 2 years ago

Do you know if there is a workaround for this issue ? Selenium 4 mades the tests flaky, we can't use it.

@JulienBreton , in my case I use the Java Selenium client. The workaround I found for my case is to use the Selenium 4 client (I am still using a beta, I did not upgrade to the final yet, but I guess it will work too) connecting via RemoteWebDriver to the Selenium Server version 3 docker (example: selenium/standalone-firefox:3.141.59-20210422).

diemol commented 2 years ago

@phoenix384 can you please help me to reproduce this issue? What test can I use and how are you starting the Grid? What operating system, available RAM and CPU?

diemol commented 2 years ago

I've been trying to use the code and GitHub projects provided above to reproduce the issue and I have been basically not able to do that.

I am happy to jump on a call with someone that is having the issue to understand better the environment where this is happening. If you are interested, please head to https://www.selenium.dev/support/ and join our Slack channel.

matclayton commented 2 years ago

We believe we are seeing this issue running the following setup.

Running selenium/standalone-firefox:4.1.0 on c5.4xlarge AWS instances with 16vCPU's and 32GB RAM.

varunmukka-okta commented 2 years ago

We are also seeing more then 2500 selenium tests failing on an average in our CI. Started happening from chrome browser/driver 95 and repeating on 96

2021-12-15 00:48:59 result: FAILURE 2021-12-15 00:48:59 errorMessage: org.openqa.selenium.TimeoutException: 2021-12-15 00:48:59 Expected condition failed: waiting for visibility of element located by By.cssSelector: [data-se="user-details-link--sign-out"] o-link (tried for 30 second(s) with 500 milliseconds interval) 2021-12-15 00:48:59 Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03' 2021-12-15 00:48:59 System info: host: '89024559e4f0', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.14.173-137.229.amzn2.x86_64', java.version: '1.8.0_265' 2021-12-15 00:48:59 Driver info: org.openqa.selenium.remote.RemoteWebDriver 2021-12-15 00:48:59 Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 96.0.4664.45, chrome: {chromedriverVersion: 96.0.4664.45 (76e4c1bb2ab46..., userDataDir: /tmp/.com.google.Chrome.A4uDsZ}, goog:chromeOptions: {debuggerAddress: localhost:46343}, javascriptEnabled: true, networkConnectionEnabled: false, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, proxy: Proxy(), setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true, webdriver.remote.sessionid: 154ecd0d09345bd6a585c56dd77...} 2021-12-15 00:48:59 Session ID: 154ecd0d09345bd6a585c56dd77f8924 2021-12-15 00:48:59 at org.openqa.selenium.support.ui.WebDriverWait.timeoutException(WebDriverWait.java:95) 2021-12-15 00:48:59 at org.openqa.selenium.support.ui.FluentWait.until(FluentWait.java:272) 2021-12-15 00:48:59 at com.okta.selenium3.webdriver.ui.pages.selector.Selector.waitUntilVisible(Selector.java:114) 2021-12-15 00:48:59 at com.okta.selenium3.webdriver.ui.pages.selector.Selector.click(Selector.java:442) 2021-12-15 00:48:59 at com.okta.selenium3.webdriver.ui.pages.selector.Selector.click(Selector.java:4CENSORED) 2021-12-15 00:48:59 at com.okta.selenium3.webdriver.ui.pages.admin.AdminHeaderV2Component.logout(AdminHeaderV2Component.java:247)

diemol commented 2 years ago

@varunmukka-okta I fail to see how that stacktrace is similar to the ones posted above. It seems like a complete different issue.

Cybermaxke commented 2 years ago

We're also seeing the same issue, when running about 700 tests, 20 will randomly fail. We have a grid setup (version 4.1.0) with 10 chrome nodes on AKS.

As a workaround, I was looking into a retry mechanism which caused me to stumble upon RetryRequest#readTimeoutPolicy, which never gets to retry due to its max duration. So I adjusted the read timeout of the remote driver and maximum duration policy to trigger some retries. Up to now, no action has failed twice in a row due to this timeout.

The workaround if anyone is interested:

var logger = LoggerFactory.getLogger("mylogger");
// the original read timeout of the remote driver was 300 seconds, but the default max duration of the policy
// is 10 seconds, so the retry doesn't do anything
// so decrease the read timeout to 90 seconds and increase the max duration of the policy to 300 seconds, this
// should give selenium 3 retries in the same timeframe as before
var field = RetryRequest.class.getDeclaredField("readTimeoutPolicy");
field.setAccessible(true);
var policy = (RetryPolicy<HttpResponse>) field.get(null);
policy.withMaxRetries(3).withMaxDuration(Duration.ofSeconds(300))
  .onRetry(event -> logger.info("Read timeout #{}. Retrying.", event.getAttemptCount()));

var capabilities = new DesiredCapabilities();
var tracer = OpenTelemetryTracer.getInstance();
var httpClientFactory = HttpClient.Factory.createDefault();
httpClientFactory = new TracedHttpClient.Factory(tracer, httpClientFactory);
var clientConfig = ClientConfig.defaultConfig()
        .readTimeout(Duration.ofSeconds(90)) // decrease read timeout to workaround timeout issue, see above
        .baseUrl(new URL("http://localhost:4444/wd/hub"));
CommandExecutor executor = new HttpCommandExecutor(Collections.emptyMap(), clientConfig, httpClientFactory);
executor = new TracedCommandExecutor(executor, tracer);
var driver = new RemoteWebDriver(executor, capabilities);
wieben commented 2 years ago

+1 for this issue, while running a test suite of ~1600 tests which do a lot of switchTo()'s, flakiness is around 1% after upgrading to selenium/standalone-chrome:4.1.0-20211123 from 3.141.59-20210713. Timeout is always after 3 minutes at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206) (used through framework Serenity BDD which includes org.seleniumhq.selenium:selenium-java:jar:4.0.0) after a call to remoteDriver.switchTo().window(someHandle). Could not reliably isolate so far...

UPDATE 28-12-2021

After further upgrade from org.seleniumhq.selenium:selenium-java:jar:4.0.0 to org.seleniumhq.selenium:selenium-java:jar:4.1.1 and from selenium/standalone-chrome:4.1.0-20211123 to selenium/standalone-chrome:4.1.1-20211217 in our test suite the number of failures of tests timing out after more than 180 seconds dropped from 16 to 5. And the stacktrace changed from

Caused by: java.util.concurrent.TimeoutException
org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java :206)
<method calling switchTo()>

to

Caused by: java.util.concurrent.TimeoutException
org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java :206)
net.jodah.failsafe.Functions.lambda$get$0(Functions.java :48)
net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java :66)
net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java :66)
net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java :66)
net.jodah.failsafe.Execution.executeSync(Execution.java :128)
net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java :379)
net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java :68)
net.jodah.failsafe.Functions.lambda$get$0(Functions.java :48)
net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java :66)
net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java :66)
net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java :66)
net.jodah.failsafe.Execution.executeSync(Execution.java :128)
net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java :379)
net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java :68)
<method calling switchTo()>
asan127 commented 2 years ago

We're also seeing 5-10% failures due to this issue.

Here's @Cybermaxke's spectacular workaround verbosely written in Java. Would really prefer this to just be configurable or fixed.

RetryRequest retryRequest = new RetryRequest();

Field readTimeoutPolicyField = retryRequest.getClass().getDeclaredField("readTimeoutPolicy");
readTimeoutPolicyField.setAccessible(true);

RetryPolicy<HttpResponse> readTimeoutPolicy =
        new RetryPolicy<HttpResponse>()
                .handle(TimeoutException.class)
                .withBackoff(1, 4, ChronoUnit.SECONDS)
                .withMaxRetries(3)
                .withMaxDuration(Duration.ofSeconds(300))
                .onRetry(e -> CustomLog.info(String.format(
                        "Read timeout #%s. Retrying.",
                        e.getAttemptCount())));

FieldUtils.removeFinalModifier(readTimeoutPolicyField);
readTimeoutPolicyField.set(retryRequest, readTimeoutPolicy);

Filter filter = new AddSeleniumUserAgent().andThen(retryRequest);
ClientConfig config = ClientConfig
        .defaultConfig()
        .baseUrl(new URL(seleniumGridUrl))
        .readTimeout(Duration.ofSeconds(90))
        .withFilter(filter);
OpenTelemetryTracer tracer = OpenTelemetryTracer.getInstance();
HttpClient.Factory httpClientFactory = HttpClient.Factory.createDefault();
TracedHttpClient.Factory tracedHttpClientFactory = new TracedHttpClient.Factory(
        tracer,
        httpClientFactory);
CommandExecutor executor = new HttpCommandExecutor(Collections.emptyMap(), config, tracedHttpClientFactory);
TracedCommandExecutor tracedCommandExecutor = new TracedCommandExecutor(executor, tracer);
remoteWebDriver = new RemoteWebDriver(tracedCommandExecutor, getChromeOptions(runTimeProps));
bgrgincic commented 2 years ago

@asan127, @Cybermaxke thank you so much for sharing this!

We've been having exactly the same issue as everyone here has described. We have been running distributed Selenium Grid in Kubernetes with 16 parallel nodes (with on demand cpu, ram - never reaching more than 50% capacity of the cluster) and java.util.concurrent.TimeoutException was happening randomly about 3% of the time.

With this workaround this flakiness has been resolved. Unfortunately I do not have a reproducible example to give as this was happening on our internal infrastructure and internal apps but the timeouts were so consistent it seemed to me that anyone working with some parallelism must be running into this issue.

fortinj66 commented 2 years ago

So not sure if this is the same issue, but it smells similar. We are getting timeouts creating sessions 'inside' the hub. Looks like a timeout on the post back to the hub

10:33:15.735 INFO [LocalDistributor.newSession] - Session created by the distributor. Id: f66a87aaba3a997d45c0d95e41177224, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 96.0.4664.110, chrome: {chromedriverVersion: 96.0.4664.45 (76e4c1bb2ab46..., userDataDir: /tmp/.com.google.Chrome.OzMb6Y}, goog:chromeOptions: {debuggerAddress: localhost:45465}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: {}, se:cdp: ws://10.128.6.90:4444/sessi..., se:cdpVersion: 96.0.4664.110, se:vnc: ws://10.128.6.90:4444/sessi..., se:vncEnabled: true, se:vncLocalAddress: ws://10.128.6.90:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true}

10:37:05.381 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "b364f407758c0a78a141403de4d96b66","eventTime": 1641569825380490172,
"eventName": "exception","attributes": 
{"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\n
Build info: version: '4.1.1', revision: 'e8fcc2cecf'\n
System info: host: 'selenium-hub-55d857fc88-b5rjw', ip: '10.129.3.219', os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.14-200.fc34.x86_64', java.version: '11.0.13'\n
Driver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\n
Build info: version: '4.1.1', revision: 'e8fcc2cecf'\n
System info: host: 'selenium-hub-55d857fc88-b5rjw', ip: '10.129.3.219', os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.14-200.fc34.x86_64', java.version: '11.0.13'\n
Driver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)\n\t
at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:83)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)\n\t
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\t
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\t
at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)\n\t
at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:83)\n\t
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\t
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:83)\n\t
at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\t
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\t
at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\t
at net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\t
at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)\n\t
at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\t
at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:83)\n\t
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\t
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\t
at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)\n\t
at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\t
at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\t
at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\t
at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\t
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\t
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\t
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\t
at org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\t
at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\t
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\t
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\t
at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\t
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\t
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\t
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\t
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\t
at org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\t
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\t
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\t
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\t
at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\t
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\t
at java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\t
at java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\t
at java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\t
at java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\t
at java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\t
at java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\t
at java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\t
at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\t
at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)\n\t... 53 more\n",

"exception.type": "org.openqa.selenium.TimeoutException","
http.flavor": 1,
"http.handler_class": "org.openqa.selenium.grid.router.HandleSession",
"http.host": "selenium-hub.selenium-4.svc.cluster.local:4444",
"http.method": "POST",
"http.request_content_length": "23",
"http.scheme": "HTTP",
"http.target": "\u002fsession\u002ff66a87aaba3a997d45c0d95e41177224\u002fse\u002flog",
"http.user_agent": "selenium\u002f4.1.1 (java unix)",
"session.id": "f66a87aaba3a997d45c0d95e41177224"}}

10:37:09.445 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: f66a87aaba3a997d45c0d95e41177224
fortinj66 commented 2 years ago

and on the client side we get logs similar to this:

org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.1.1', revision: 'e8fcc2cecf'
System info: host: 'jenkins-qa-new-w3c60', ip: '10.128.5.115', os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.14-200.fc34.x86_64', java.version: '11.0.9.1'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [null, newSession {capabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 96.0, goog:chromeOptions: {args: [test-type, --disable-impl-side-painting, chrome.switches, --disable-extensions], extensions: []}, ma:applicationName: shop-live}], desiredCapabilities=Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 96.0, goog:chromeOptions: {args: [test-type, --disable-impl-side-painting, chrome.switches, --disable-extensions], extensions: []}, ma:applicationName: shop-live}}]
Capabilities {}
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:561)
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:230)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:151)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:138)
    at com.marketamerica.automationframework.tools.testmanagement.grid.DesktopGrid.getRemoteWebBrowser(DesktopGrid.java:61)
    at com.marketamerica.automationframework.tools.testmanagement.TestBuilder.getWebDriver(TestBuilder.java:202)
    at com.marketamerica.automationframework.tools.testmanagement.TestCases.setupMethod(TestCases.java:268)
    at com.shop.automation.selenium.shop.tests.ShopBaseTests.setupMethod(ShopBaseTests.java:34)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:133)
    at org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:62)
    at org.testng.internal.ConfigInvoker.invokeConfigurationMethod(ConfigInvoker.java:385)
    at org.testng.internal.ConfigInvoker.invokeConfigurations(ConfigInvoker.java:321)
    at org.testng.internal.TestInvoker.runConfigMethods(TestInvoker.java:700)
    at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:527)
    at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:173)
    at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
    at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:824)
    at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:146)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at org.testng.TestRunner.privateRun(TestRunner.java:794)
    at org.testng.TestRunner.run(TestRunner.java:596)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:377)
    at org.testng.SuiteRunner.access$000(SuiteRunner.java:28)
    at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:418)
    at org.testng.internal.thread.ThreadUtil.lambda$execute$0(ThreadUtil.java:64)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.RuntimeException: NettyHttpHandler request execution error
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:83)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:83)
    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
    at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
    at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:83)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:83)
    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
    at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
    at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:83)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)
    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:102)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
    at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:543)
    ... 34 more
Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Request timeout to selenium-hub-2.selenium-4.svc.cluster.local/172.30.230.181:4444 after 180000 ms
    at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
    ... 65 more
Caused by: java.util.concurrent.TimeoutException: Request timeout to selenium-hub-2.selenium-4.svc.cluster.local/172.30.230.181:4444 after 180000 ms
    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
    at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    ... 1 more
GgStormer commented 2 years ago

We're also seeing 5-10% failures due to this issue.

Here's @Cybermaxke's spectacular workaround verbosely written in Java. Would really prefer this to just be configurable or fixed.

RetryRequest retryRequest = new RetryRequest();

Field readTimeoutPolicyField = retryRequest.getClass().getDeclaredField("readTimeoutPolicy");
readTimeoutPolicyField.setAccessible(true);

RetryPolicy<HttpResponse> readTimeoutPolicy =
      new RetryPolicy<HttpResponse>()
              .handle(TimeoutException.class)
              .withBackoff(1, 4, ChronoUnit.SECONDS)
              .withMaxRetries(3)
              .withMaxDuration(Duration.ofSeconds(300))
              .onRetry(e -> CustomLog.info(String.format(
                      "Read timeout #%s. Retrying.",
                      e.getAttemptCount())));

FieldUtils.removeFinalModifier(readTimeoutPolicyField);
readTimeoutPolicyField.set(retryRequest, readTimeoutPolicy);

Filter filter = new AddSeleniumUserAgent().andThen(retryRequest);
ClientConfig config = ClientConfig
      .defaultConfig()
      .baseUrl(new URL(seleniumGridUrl))
      .readTimeout(Duration.ofSeconds(90))
      .withFilter(filter);
OpenTelemetryTracer tracer = OpenTelemetryTracer.getInstance();
HttpClient.Factory httpClientFactory = HttpClient.Factory.createDefault();
TracedHttpClient.Factory tracedHttpClientFactory = new TracedHttpClient.Factory(
      tracer,
      httpClientFactory);
CommandExecutor executor = new HttpCommandExecutor(Collections.emptyMap(), config, tracedHttpClientFactory);
TracedCommandExecutor tracedCommandExecutor = new TracedCommandExecutor(executor, tracer);
remoteWebDriver = new RemoteWebDriver(tracedCommandExecutor, getChromeOptions(runTimeProps));

Could you please provide where from you take RetryRequest class?

bgrgincic commented 2 years ago

import org.openqa.selenium.remote.http.RetryRequest;

    <dependency>
        <groupId>org.seleniumhq.selenium</groupId>
        <artifactId>selenium-java</artifactId>
        <version>4.1.1</version>
    </dependency>