SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.15k stars 8.1k forks source link

java.util.concurrent.TimeoutException thrown at random netty read timeouts with RemoteWebDriver #9528

Closed rcesarlumis closed 1 year ago

rcesarlumis commented 3 years ago

🐛 Bug Report

Netty at random times gets a read timeout at. This happens at different selenium commands ( for example: WebDriver.switchTo().defaultContent, WebElement.click, WebDriver.switchTo().window, WebElement.sendKeys, WebDriver.get, Alert.accept ) and at random in a quite small percentage chance (<1% test cases).

To Reproduce

I don't have specific steps to reproduce. When our CI runs our test suite of thousands of tests run, about 10 fails at random due to this timeout. I could not reproduce by doing a simple long loop with a few commands on my development workstation.

Timeout details

This timeout always occurs at:

Caused by: java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)

I could confirm that it took 3 minutes there, confirming that it is due to the default 3 minutes read timeout the selenium configures the netty with. But the commands that are timing outs would normally run very fast, much less than one second.

Trying the code below in a method called probably thousands times by my test suite, it failed entering the catch. But after it called again driver.switchTo().defaultContent() at the end of the code below it worked. So it seems that although the read timeout happens in netty, it still works normally afterwards.

try
{
driver.switchTo().defaultContent();
}
catch (TimeoutException e)
{
// this should never happen, but started happening at random after updating to selenium 4
// output information to help troubleshoot
System.err.println("TimeoutException thrown while trying to go to defaultContent (stack below). Trying again...");
e.printStackTrace();

try
{
Thread.sleep(5000);
}
catch (InterruptedException e1)
{
}

driver.switchTo().defaultContent();
}

In this case, the stack trace got by the e.printStackTrace() above was:

org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.0.0-beta-3', revision: '5d108f9a67'
System info: host: '51e5404d333b', ip: '172.18.0.7', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.1'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [a5e3bf25-ba72-4023-b219-76406cf58660, switchToFrame {id=null}]
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 88.0, javascriptEnabled: true, moz:accessibilityChecks: false, moz:buildID: 20210415204500, moz:debuggerAddress: localhost:46562, moz:geckodriverVersion: 0.29.0, moz:headless: false, moz:processID: 9286, moz:profile: /tmp/rust_mozprofileQJRwQP, moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, platformVersion: 3.10.0-1127.19.1.el7.x86_64, rotatable: false, se:cdp: ws://172.18.0.3:4444/sessio..., se:cdpVersion: 85, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: a5e3bf25-ba72-4023-b219-76406cf58660
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:103)
    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:181)
    at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:39)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:619)
    at org.openqa.selenium.remote.RemoteWebDriver$RemoteTargetLocator.defaultContent(RemoteWebDriver.java:1097)
    (...)
Caused by: java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
    ... 38 more

Environment

OS: Docker containers inside a CentOS Browser: RemoteWebDriver using Firefox in selenium/standalone-firefox:4.0.0-beta-3-20210426 docker image. Also tried the selenium/standalone-firefox:4.0.0-beta-4-prerelease-20210527 docker image, but the same thing happened. Browser Driver version: RemoteWebDriver from selenium-java 4.0.0-beta-3 Language Bindings version: Java 4.0.0-beta-3 The RemoteWebDriver runs in a container that is running in the same docker host as the browser container. So all network between them is only logical in the same machine. Previously we were using Selenium 2.52, in the same docker host, and never happened anything similar to such timeout.

Do you have any tips about what I can try to fix it or investigate more about this?

therealdjryan commented 2 years ago

As a horrible hack I implemented a TestNG IRetryAnalyzer which retries test once when they fail with any of these exceptions.

NoSuchSessionException 
SessionNotCreatedExcept
ScriptTimeoutException

It reduces the failure count, but not completely

stassilla commented 2 years ago

any luck with workaround on C#?

rcesarlumis commented 2 years ago

any luck with workaround on C#?

I guess my workaround of using 3.x selenium server with 4.x selenium client would be language independent. I don't know what limitations this would cause, but currently I have thousands of tests with this setup on a Java environment working fine.

diemol commented 2 years ago

Thank you @Cybermaxke for the workaround. This has been included already by @pujagani through #10220.

We are planning a 4.1.2 release for next week (which includes this fix).

diemol commented 2 years ago

4.1.2 is out with this change, https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar, any feedback is appreciatted.

rcesarlumis commented 2 years ago

4.1.2 is out with this change, https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar, any feedback is appreciatted.

@diemol , I repeated the test done previously here. Updating only the selenium docker image to 4.1.2 (and still using at the client the selenium-java 4.0.0-beta-4) the error reproduced. After updating the pom.xml to use the selenium-java 4.1.2, the error did not reproduce anymore.

bdux commented 2 years ago

4.1.2 is out with this change, https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar, any feedback is appreciatted.

@diemol , I repeated the test done previously here. Updating only the selenium docker image to 4.1.2 (and still using at the client the selenium-java 4.0.0-beta-4) the error reproduced. After updating the pom.xml to use the selenium-java 4.1.2, the error did not reproduce anymore.

Unfortunately, I am still struggling with this error on 4.1.2.

diemol commented 2 years ago

@bdux what are you using on the client side?

bdux commented 2 years ago

Updated everything to 4.1.2, containers as well as maven dependencies. I am seeing a lot of orphaned sessions. Concurrency within Containers (more than 1 session per container) seems to be a big problem.

Some more context: I am running a stack configuration, with 1 hub and chrome/edge nodes. After a while the Hub Frontend becomes unresponsive, which coincides with the session timeouts (when orphaned sessions are supposed to be killed).

diemol commented 2 years ago

Updated everything to 4.1.2, containers as well as maven dependencies. I am seeing a lot of orphaned sessions. Concurrency within Containers (more than 1 session per container) seems to be a big problem.

Would love to see more concrete information. That is very ambiguous. How does the setup look like? How is the Grid started? Can you share the logs please? What test are you running to see the issues?

bdux commented 2 years ago

This is my composefile for the stack deploy:


version: '3.1'

services:

  selenium-hub:
    image: selenium/hub:latest
    environment:
      - SE_SESSION_RETRY_TIMEOUT=60
      - SE_SESSION_RETRY_INTERVAL=10
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4446:4444"
    deploy:
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.labels.function == selenium-web
          - node.labels.selenium_env == web-testing
      resources:
        limits:
          memory: 4G
    networks:
      - main

  chrome:
    image: selenium/node-chrome:latest
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_SESSION_TIMEOUT=600
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1080  
      - VNC_VIEW_ONLY=1
      - VNC_NO_PASSWORD=1
    volumes:
      - /dev/shm:/dev/shm
      - /usr/share/zoneinfo/Europe/Berlin:/etc/localtime:ro
      - shared-nfs:/ta_shared
    deploy:
      replicas: 25
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.labels.function == selenium-web
          - node.labels.selenium_env == web-testing
      resources:
        limits:
          memory: 1G
    depends_on:
      - selenium-hub
    extra_hosts:
      - *******
    entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'
    networks:
      - main

  edge:
    image: selenium/node-edge:latest
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_SESSION_TIMEOUT=600
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1080
      - VNC_VIEW_ONLY=1
      - VNC_NO_PASSWORD=1
    volumes:
      - /dev/shm:/dev/shm
      - /usr/share/zoneinfo/Europe/Berlin:/etc/localtime:ro
      - shared-nfs:/ta_shared
    deploy:
      replicas: 25
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.labels.function == selenium-web
          - node.labels.selenium_env == web-testing
      resources:
        limits:
          memory: 1G
    depends_on:
      - selenium-hub
    extra_hosts:
      - ********
    entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'
    networks:
      - main

networks:
  main:
   external:
    name: webtesting_selenium_net

volumes:
  shared-nfs:
    driver: local
    driver_opts:
      type: nfs
      o: nfsvers=3,addr=172.29.98.186,rw
      device: ":*******"

We run this on a Docker Swarm environment, as a stack.

The hub log shows as last entry before dying:

12:14:02.290 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "b5dc2c44f68bf5e7adade626a210404e","eventTime": 1643631242290131000,"eventName": "HTTP request execution complete","attributes": {"http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueue","http.host": "172.29.98.166:4446","http.method": "POST","http.request_content_length": "1197","http.scheme": "HTTP","http.status_code": 500,"http.target": "\u002fsession","http.user_agent": "selenium\u002f4.1.2 (java unix)"}}

12:14:08.684 ERROR [NettyRequestSender.writeRequest] - Can't write request

java.io.IOException: Stream closed

    at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)

    at java.base/java.io.BufferedInputStream.reset(BufferedInputStream.java:452)

    at org.asynchttpclient.netty.request.body.NettyInputStreamBody.write(NettyInputStreamBody.java:61)

    at org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:433)

    at org.asynchttpclient.netty.channel.NettyConnectListener.writeRequest(NettyConnectListener.java:80)

    at org.asynchttpclient.netty.channel.NettyConnectListener.onSuccess(NettyConnectListener.java:156)

    at org.asynchttpclient.netty.channel.NettyChannelConnector$1.onSuccess(NettyChannelConnector.java:92)

    at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:26)

    at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)

    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)

    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)

    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)

    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)

    at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)

    at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)

    at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)

    at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)

    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)

    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)

    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)

    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)

    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)

    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)

    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)

    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

    at java.base/java.lang.Thread.run(Thread.java:829)

12:14:10.689 ERROR [NettyRequestSender.writeRequest] - Can't write request

java.io.IOException: Stream closed

We are running a suite of about 500 GUI tests, with up to 40 in parallel at once. On the v3 Grid, he have 1 hub, with 4 nodes and 10 browsers per node, and run 40 tests in parallel without any problems. v4 Grid does not seem to be able to handle anything above 10 concurrent sessions, on identical hardware. It's really weird.

diemol commented 2 years ago

@bdux but that is not the same error that is being discussed in this issue. I don't understand how they are related, could you please clarify?

bdux commented 2 years ago

@bdux but that is not the same error that is being discussed in this issue. I don't understand how they are related, could you please clarify?

Sorry, my bad. the error that is referenced in the issue, is the first ones that show up, and from then on, it goes down hill towards the errors I pasted. I currently don't have direct access to the logs, as our system kills the containers after each run, so I am trying to fish out more information to complete the chain of errors. Apologies for the confusion.

Edit: I might indeed have mixed up the error messages. it looks like it might go more into the direction of #1439

omkarkhatavkar commented 2 years ago

https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar are kept in the 4.1.0 directory is this pattern where '0' is redundant? earlier it used to store in http://selenium-release.storage.googleapis.com/${SELENIUM_MAJOR_VERSION}.${SELENIUM_MINOR_VERSION} pattern. I know it is not related to bug though

diemol commented 2 years ago

https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar are kept in the 4.1.0 directory is this pattern where '0' is redundant? earlier it used to store in http://selenium-release.storage.googleapis.com/${SELENIUM_MAJOR_VERSION}.${SELENIUM_MINOR_VERSION}

Yes, now all releases are in GitHub releases and patches get uploaded to the nearest major/minor release.

omkarkhatavkar commented 2 years ago

@diemol I'm not able to see the issue related to java.util.concurrent.TimeoutExceptionafter applying the jar, but after a certain time or execution the seeing error as "eventName": "No slot matched the requested cap". looking at the /dev/shm is also not filled that much

SE_SESSION_REQUEST_TIMEOUT=300 SE_NODE_MAX_SESSION=10

# df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            15G   35M   15G   1% /dev/shm   

We are running with the below configuration with 20 core and 30 GB RAM VM

diemol commented 2 years ago

@diemol I'm not able to see the issue related to java.util.concurrent.TimeoutExceptionafter applying the jar, but after a certain time or execution the seeing error as "eventName": "No slot matched the requested cap". looking at the /dev/shm is also not filled that much

SE_SESSION_REQUEST_TIMEOUT=300 SE_NODE_MAX_SESSION=10

# df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            15G   35M   15G   1% /dev/shm   

We are running with the below configuration with 20 core and 30 GB RAM VM

But that is a complete different issue. That means that no slot was found to fulfill the requested capabilities. /dev/shm is not related to this at all. If you have a way to reproduce that, please open a new issue.

vet0188 commented 2 years ago

Hi there! I'm still throwing the TimeoutException after upgrading to v4.1.2, I see that the stack trace is different but it might be related to the fix here is the output:

org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'USE2DEVP5704', ip: 'xxx.xxx.xxx.xxx', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_311'
Driver info: driver.version: Driver
    at org.openqa.selenium.devtools.Connection.sendAndWait(Connection.java:161)
    at org.openqa.selenium.devtools.DevTools.send(DevTools.java:70)
    at org.openqa.selenium.devtools.idealized.Network.prepareToInterceptTraffic(Network.java:143)
    at org.openqa.selenium.devtools.idealized.Network.addAuthHandler(Network.java:127)
    at org.openqa.selenium.chromium.ChromiumDriver.register(ChromiumDriver.java:176)
    at org.openqa.selenium.HasAuthentication.register(HasAuthentication.java:49)
    at com.tr.emcm.cucumber.utils.Driver.buildChrome(Driver.java:106)
    at com.tr.emcm.cucumber.utils.Driver.initialiseLocalDriver(Driver.java:196)
    at com.tr.emcm.cucumber.utils.Driver.initialiseDriver(Driver.java:184)
    at com.tr.emcm.cucumber.utils.Hooks.beforeScenario(Hooks.java:29)
    at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at cucumber.runtime.Utils$1.call(Utils.java:26)
    at cucumber.runtime.Timeout.timeout(Timeout.java:16)
    at cucumber.runtime.Utils.invoke(Utils.java:20)
    at cucumber.runtime.java.JavaHookDefinition.execute(JavaHookDefinition.java:60)
    at cucumber.runner.HookDefinitionMatch.runStep(HookDefinitionMatch.java:16)
    at cucumber.runner.TestStep.executeStep(TestStep.java:63)
    at cucumber.runner.TestStep.run(TestStep.java:49)
    at cucumber.runner.TestCase.run(TestCase.java:40)
    at cucumber.runner.Runner.runPickle(Runner.java:40)
    at cucumber.runtime.junit.PickleRunners$NoStepDescriptions.run(PickleRunners.java:146)
    at cucumber.runtime.junit.FeatureRunner.runChild(FeatureRunner.java:67)
    at cucumber.runtime.junit.FeatureRunner.runChild(FeatureRunner.java:22)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at cucumber.runtime.junit.FeatureRunner.run(FeatureRunner.java:72)
    at cucumber.api.junit.Cucumber.runChild(Cucumber.java:122)
    at cucumber.api.junit.Cucumber.runChild(Cucumber.java:64)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:410)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
    at org.openqa.selenium.devtools.Connection.sendAndWait(Connection.java:150)
    ... 41 more

I don't use neither Grid nor Docker for running tests. It is reproduced on my machine that is not powerful, and when I start tests running in parallel (I use 2 threads for local), however it is not happening when I run tests remotely (unlimited threads) in SauceLabs or in jenkins (maybe it depends on test count).

diemol commented 2 years ago

@vet0188 can you share more details on what resources your machine has and what test do you run to reproduce the issue locally?

kasperSiteimprove commented 2 years ago

any luck with workaround on C#?

@stassilla, the C# binding is easily extensible in the right places so for a quick and dirty workaround you can try this: https://gist.github.com/kasperSiteimprove/d4a6f79e95a53828c0020de44c66d0be

The above creates a custom http command executor that just retries all requests that times out once. You'd then instantiate your remote webdriver instance along these lines:

var remoteWebDriver = new RemoteWebDriver(new RemoteCommandExecutor(new Uri("http://myGridUrl"), TimeSpan.FromSeconds(90)), capabilities);

A word of caution, though, as blindly retrying requests like this may have unforeseen consequences. So test it out and see if it works in your context and maybe improve upon the retrying mechanism, as it's rather, umm, artless...

vet0188 commented 2 years ago

@diemol here are specs of my machine: 2 cores (4 threads) 2.1 GHz 16GB of memory HDD 90mb/s I have to mention that this is actually an aws dedicated virtual machine. My tests start from setting up the driver with the new authentication feature, then land on a page, then do some verification, then take screenshot, then close the driver. Usually I run 50 tests in parallel mode with maven threadCount=2, perCoreThreadCount=false. If a simple project to reproduce the issue required, then I need some time to prepare it.

diemol commented 2 years ago

I'd be nice to have a reproduction scenario, @vet0188, given that you have the issue without Grid.

diemol commented 2 years ago

any luck with workaround on C#?

@stassilla, the C# binding is easily extensible in the right places so for a quick and dirty workaround you can try this: https://gist.github.com/kasperSiteimprove/d4a6f79e95a53828c0020de44c66d0be

The above creates a custom http command executor that just retries all requests that times out once. You'd then instantiate your remote webdriver instance along these lines:

var remoteWebDriver = new RemoteWebDriver(new RemoteCommandExecutor(new Uri("http://myGridUrl"), TimeSpan.FromSeconds(90)), capabilities);

A word of caution, though, as blindly retrying requests like this may have unforeseen consequences. So test it out and see if it works in your context and maybe improve upon the retrying mechanism, as it's rather, umm, artless...

Are you still facing the issue when using 4.1.2 and using C# on the client side?

stassilla commented 2 years ago

Hi Diego, we've decided to roll back to grid 3 for now.

On Fri, Feb 4, 2022 at 3:38 AM Diego Molina @.***> wrote:

any luck with workaround on C#?

@stassilla https://github.com/stassilla, the C# binding is easily extensible in the right places so for a quick and dirty workaround you can try this: https://gist.github.com/kasperSiteimprove/d4a6f79e95a53828c0020de44c66d0be

The above creates a custom http command executor that just retries all requests that times out once. You'd then instantiate your remote webdriver instance along these lines:

var remoteWebDriver = new RemoteWebDriver(new RemoteCommandExecutor(new Uri("http://myGridUrl"), TimeSpan.FromSeconds(90)), capabilities);

A word of caution, though, as blindly retrying requests like this may have unforeseen consequences. So test it out and see if it works in your context and maybe improve upon the retrying mechanism, as it's rather, umm, artless...

Are you still facing the issue when using 4.1.2 and using C# on the client side?

— Reply to this email directly, view it on GitHub https://github.com/SeleniumHQ/selenium/issues/9528#issuecomment-1029909336, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACETYZ64ZFQQ7XSNF2YZW5DUZO3CJANCNFSM45VQSCWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

kasperSiteimprove commented 2 years ago

Are you still facing the issue when using 4.1.2 and using C# on the client side?

Yes, unfortunately. The admittedly very basic workaround I posted above seems to have mitigated the problem entirely, though. To me, the fact that it does work, seem to indicate that on rare occasions a request made to the grid isn't acknowledged and/or possibly not even processed.

In my testing so far with the workaround in place, I haven't seen any weird failures that can be attributed to, say, a click event being processed twice because of the retry. Precisely under what circumstances the grid might not process a request is proving to be really tricky to answer. Earlier I thought I had a bulletproof repro using that simple program I posted in the Selenium slack channel, but it seems that this doesn't always trigger the issue.

or1k commented 2 years ago

Hello everyone. I have a similar problem with expectations. I am using Selenium + Selenoid. For the test, I made a limit of 1 container. I am running two tests at the same time. The first one starts fine. The second one gets in QUEUED and waits for a free container. But he does not wait and the connection is terminated after 180 seconds. Actually how can I increase the waiting time?

февр. 09, 2022 3:06:44 PM org.openqa.selenium.remote.tracing.opentelemetry.OpenTelemetryTracer createTracer
INFO: Using OpenTelemetry for tracing
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'DESKTOP-NMMTF1R', ip: '26.190.126.197', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.6'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [null, newSession {capabilities=[Capabilities {browserName: chrome}], desiredCapabilities=Capabilities {browserName: chrome, enableVNC: true, enableVideo: false, name: Landings_AllBots_Flow, version: 98.0}}]
Capabilities {}

    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:576)
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:245)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:161)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:143)
    at BrowserConfig.MyChromeBrowserClass.initDriverInstance(MyChromeBrowserClass.java:33)
    at Parents.BaseTest.setup(BaseTest.java:75)
    at ReDirectFlow.AllBots_Flow.setup(AllBots_Flow.java:28)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
    at org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:59)
    at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:458)
    at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:222)
    at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:142)
    at org.testng.internal.TestMethodWorker.invokeBeforeClassMethods(TestMethodWorker.java:168)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:105)
    at org.testng.TestRunner.privateRun(TestRunner.java:648)
    at org.testng.TestRunner.run(TestRunner.java:505)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
    at org.testng.SuiteRunner.run(SuiteRunner.java:364)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
    at org.testng.TestNG.runSuites(TestNG.java:1049)
    at org.testng.TestNG.run(TestNG.java:1017)
    at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
    at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)
Caused by: java.io.UncheckedIOException: java.io.IOException: Stream closed
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:80)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
    at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
    at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
    at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
    at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
    at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
    at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)
    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:102)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
    at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:558)
    ... 31 more
Caused by: java.io.IOException: Stream closed
    at java.base/java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:165)
    at java.base/java.io.BufferedInputStream.available(BufferedInputStream.java:416)
    at java.base/java.io.FilterInputStream.available(FilterInputStream.java:167)
    at java.base/java.io.PushbackInputStream.available(PushbackInputStream.java:274)
    at io.netty.handler.stream.ChunkedStream.isEndOfInput(ChunkedStream.java:82)
    at io.netty.handler.stream.ChunkedStream.readChunk(ChunkedStream.java:109)
    at io.netty.handler.stream.ChunkedStream.readChunk(ChunkedStream.java:36)
    at io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:229)
    at io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:131)
    at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
    at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
    at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:311)
    at org.asynchttpclient.netty.request.body.NettyInputStreamBody.write(NettyInputStreamBody.java:77)
    at org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:433)
    at org.asynchttpclient.netty.channel.NettyConnectListener.writeRequest(NettyConnectListener.java:80)
    at org.asynchttpclient.netty.channel.NettyConnectListener.onSuccess(NettyConnectListener.java:156)
    at org.asynchttpclient.netty.channel.NettyChannelConnector$1.onSuccess(NettyChannelConnector.java:92)
    at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:26)
    at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
    at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
    at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
    at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
    at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)

Screenshot_2

JulienBreton commented 2 years ago

Hi, I can't reproduce the issue with the java.util.concurrent.TimeoutException. My tests work well now. Thanks for your hard work and this awesome tool !

webtoman commented 2 years ago

I tried selenium 4 again now after hitting this selenium.common.exceptions.TimeoutException: Message: Unable to execute request for an existing session: java.util.concurrent.TimeoutException in 4.0.0. Decided to test again now, and upgraded the grid to 4.1.2, and the python client to 4.1.0. Now I get straight hangs instead of timeouts, so our test framework times out after tens of minutes instead.

Using selenium hub/node configuration with 3 (also 4) nodes. java-8-openjdk-amd64. Hub started like this: java -jar selenium-server-4.1.2.jar hub --port 4445 Nodes started like this: java -jar selenium-server-4.1.2.jar node --port <increasing-number> --hub http://<server-name>:4445

diemol commented 2 years ago

Thanks for the feedback, @webtoman.

Can you please share a test that we can use to reproduce the issue?

webtoman commented 2 years ago

@diemol I'm sorry, I'm not able to reproduce it on demand. We run ~50 concurrent sessions, and only some of them will randomly exhibit this problem. Two nodes are running on one VM, and one node and the hub is running on another VM. I think it gets worse with more nodes, but it is hard to test this.

The driver is set up like this

    driver = webdriver.Remote(
        keep_alive=True,
        options=options,
        browser_profile=profile,
        command_executor=f'http://{host}/wd/hub',
        desired_capabilities=desired_caps)

The hanging commands are all different too, looks like this on the python side: 2022-03-01 13:18:51.042 CET: [selenium.webdriver.remote.remote_connection] {DEBUG} POST http://<hostname>:4445/wd/hub/session/d7ec165a5a7db331aba0c9cb92b1da8d/element {"using": "xpath", "value": "//*[contains(@class, \"alert-error\") or contains(@class, \"alert-danger\") or contains(@class, \"toast-error\") or contains(@class, \"status-bar-error\")]"} or 2022-03-01 13:20:51.954 CET: [selenium.webdriver.remote.remote_connection] {DEBUG} POST http://<hostname>:4445/wd/hub/session/7ae3a34efedab23be6bf7b64123b26fa/window {"handle": "CDwindow-07AA63DEE56E59CDD60EA9F64C7034DB"} or 2022-03-01 12:50:21.745 CET: [selenium.webdriver.remote.remote_connection] {DEBUG} POST http://<hostname>:4445/wd/hub/session/8b3c4024763aef0ff946a48f05dfdeee/execute/sync {"script": "return arguments[0][arguments[1]]", "args": [{"element-6066-11e4-a52e-4f735466cecf": "9939aaf7-1faa-4a3e-ad62-9220378e9ec9"}, "notAnAttribute"]}

evertones commented 2 years ago

@diemol: I've been testing the upgrade for Selenium 4.x and I thought this issue was not happening anymore. I used to see it in the build log from the CI server, which I don't see anymore. However, tests have been very unstable to me in Selenium 4.x and there are many suites aborted randomly. I just checked the hub logs and found the exception that seems to be the same problem reported in this issue:

10:51:04.832 WARN [SeleniumSpanExporter$1.lambda$export$0] - 
{
  "traceId": "e84555dd7a75b7ee7c41d89a4e8c83cb",
  "eventTime": 1646351464831430584,
  "eventName": "exception",
  "attributes": {
    "exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException
      Build info: version: '4.1.2', revision: '9a5a329c5a'
      System info: host: 'selenium-hub-f99c5486d-nj8xk', ip: '10.32.128.0', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-100-generic', java.version: '11.0.13'
      Driver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
      Build info: version: '4.1.2', revision: '9a5a329c5a'
      System info: host: 'selenium-hub-f99c5486d-nj8xk', ip: '10.32.128.0', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-100-generic', java.version: '11.0.13'
      Driver info: driver.version: unknown
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
        at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
        at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
        at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
        at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
        at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)
        at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
        at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
        at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)
        at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
        at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
        at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
        at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
        at org.openqa.selenium.grid.router.Router.execute(Router.java:91)
        at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
        at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
        at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
        at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
        at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)
        at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
        at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
        at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
        at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
        at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
        at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
        at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
        at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
        at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
        at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
        at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
        at java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base\u002fjava.lang.Thread.run(Thread.java:829)
      Caused by: java.util.concurrent.TimeoutException
        at java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
        at java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
        at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
        ... 53 more",
    "exception.type": "org.openqa.selenium.TimeoutException",
    "http.flavor": 1,
    "http.handler_class": "org.openqa.selenium.grid.router.HandleSession",
    "http.host": "jetfire.redlounge.io:31002",
    "http.method": "POST",
    "http.request_content_length": "140",
    "http.scheme": "HTTP",
    "http.target": "\u002fsession\u002f146fd4e09638625a9e66cdbf5835bdf4\u002felement\u002f5563c115-2f7a-4cbd-88c0-0245b546d386\u002felements",
    "http.user_agent": "selenium\u002f4.1.2 (java unix)",
    "session.id": "146fd4e09638625a9e66cdbf5835bdf4"
  }
}

On the CI server I've been seeing a lot of issues like the one below, but I have no idea if that's related:

com.mycompany.test.selenium.cases.users.filters.SearchBySuperUserSpec *** ABORTED *** (3 minutes, 17 seconds)
  org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Response code 500. Message: Could not start a new session. Unable to parse: 400 Bad Request
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'r2d2', ip: '192.168.16.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-96-generic', java.version: '11.0.13'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [null, newSession {capabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [], extensions: []}}], desiredCapabilities=Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [], extensions: []}}}]
Capabilities {}
  at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:126)
  at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
  at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
  at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
  at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
  at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:558)
  at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:245)
  at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:161)
  at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:143)
  at com.mycompany.selenium.WebDriverBuilder.buildRemote(WebDriverBuilder.scala:21)
Environment

I'm afraid this is the same issue, but not thrown on the client-side.

RavirajsinhZalaTR commented 2 years ago

@diemol I am also running into "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException Build info: version: '4.1.2' error, this happens when running test on 8 core, 16 gb machines, with 8 chrome nodes created though docker compose. we are running around 5 to 6 parallel tests, i see grid has enough capacity,

17:30:37.034 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: 84f3ce1f70810c0878824c830b5fe366 17:45:42.510 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "9f35ab34db7873082640f605d68b3340","eventTime": 1647971142509664300,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.2', revision: '9a5a329c5a'\nSystem info: host: '48c4ac9ca739', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-240.el8.x86_64', java.version: '11.0.13'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.2', revision: '9a5a329c5a'\nSystem info: host: '48c4ac9ca739', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-240.el8.x86_64', java.version: '11.0.13'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)\n\tat net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)\n\tat net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)\n\t... 49 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "qaautoregexec02ind1.hqdev.india:4444","http.method": "POST","http.request_content_length": "608","http.scheme": "HTTP","http.target": "\u002fsession\u002f57aba65007b21e8a051339835f11a821\u002fexecute\u002fsync","http.user_agent": "selenium\u002f3.14.0 (java windows)","session.id": "57aba65007b21e8a051339835f11a821"}}

18:05:38.390 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: 3c31be29081bb38ce79e79d37cb18b7f 18:05:39.132 INFO [LocalDistributor.newSession] - Session request received by the distributor: [Capabilities {acceptSslCerts: true, browserName: chrome, goog:chromeOptions: {args: [window-size=1920,1080, --no-sandbox, --ignore-certificate-errors, --disable-features=VizDispl..., --disable-dev-shm-usage, disable-features=NetworkSer...], excludeSwitches: [enable-automation, load-extension], extensions: [], prefs: {credentials_enable_service: false, download.default_directory: D:\jenkins_slave\workspace..., profile.default_content_settings.popups: 0, profile.password_manager_enabled: false, safebrowsing.enabled: true}}, idleTimeout: 500, name: Test Class CD30232}, Capabilities {browserName: chrome, goog:chromeOptions: {args: [window-size=1920,1080, --no-sandbox, --ignore-certificate-errors, --disable-features=VizDispl..., --disable-dev-shm-usage, disable-features=NetworkSer...], excludeSwitches: [enable-automation, load-extension], extensions: [], prefs: {credentials_enable_service: false, download.default_directory: D:\jenkins_slave\workspace..., profile.default_content_settings.popups: 0, profile.password_manager_enabled: false, safebrowsing.enabled: true}}}] 18:05:50.912 INFO [LocalDistributor.newSession] - Session created by the distributor. Id: 3db950f8b367b372d3457d78bb4fe37c, Caps: Capabilities {acceptInsecureCerts: false, acceptSslCerts: true, browserName: chrome, browserVersion: 98.0.4758.102, chrome: {chromedriverVersion: 98.0.4758.102 (273bf7ac8c90..., userDataDir: /tmp/.com.google.Chrome.9jVnD9}, goog:chromeOptions: {debuggerAddress: localhost:41973}, idleTimeout: 500, name: Test Class CD30232, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: {}, se:cdp: ws://172.18.0.5:4444/sessio..., se:cdpVersion: 98.0.4758.102, se:vnc: ws://172.18.0.5:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.18.0.5:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true} 18:53:54.018 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: b1e5850b5060b29ef7210960a77630a0 18:53:54.432 INFO [LocalDistributor.newSession] - Session request received by the distributor: [Capabilities {acceptSslCerts: true, browserName: chrome, goog:chromeOptions: {args: [window-size=1920,1080, --no-sandbox, --ignore-certificate-errors, --disable-features=VizDispl..., --disable-dev-shm-usage, disable-features=NetworkSer...], excludeSwitches: [enable-automation, load-extension], extensions: [], prefs: {credentials_enable_service: false, download.default_directory: D:\jenkins_slave\workspace..., profile.default_content_settings.popups: 0, profile.password_manager_enabled: false, safebrowsing.enabled: true}}, idleTimeout: 500, name: Test Class CD32824}, Capabilities {browserName: chrome, goog:chromeOptions: {args: [window-size=1920,1080, --no-sandbox, --ignore-certificate-errors, --disable-features=VizDispl..., --disable-dev-shm-usage, disable-features=NetworkSer...], excludeSwitches: [enable-automation, load-extension], extensions: [], prefs: {credentials_enable_service: false, download.default_directory: D:\jenkins_slave\workspace..., profile.default_content_settings.popups: 0, profile.password_manager_enabled: false, safebrowsing.enabled: true}}}] 18:53:58.222 INFO [LocalDistributor.newSession] - Session created by the distributor. Id: e10fcf76931a203b6a7707c9e318db59, Caps: Capabilities {acceptInsecureCerts: false, acceptSslCerts: true, browserName: chrome, browserVersion: 98.0.4758.102, chrome: {chromedriverVersion: 98.0.4758.102 (273bf7ac8c90..., userDataDir: /tmp/.com.google.Chrome.JaRVtP}, goog:chromeOptions: {debuggerAddress: localhost:36397}, idleTimeout: 500, name: Test Class CD32824, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: {}, se:cdp: ws://172.18.0.8:4444/sessio..., se:cdpVersion: 98.0.4758.102, se:vnc: ws://172.18.0.8:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.18.0.8:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true} 19:13:06.783 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: e10fcf76931a203b6a7707c9e318db59 19:14:59.917 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "c17f52efd3a1d5c8859df226c9d265f5","eventTime": 1647976499916259300,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.2', revision: '9a5a329c5a'\nSystem info: host: '48c4ac9ca739', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-240.el8.x86_64', java.version: '11.0.13'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.2', revision: '9a5a329c5a'\nSystem info: host: '48c4ac9ca739', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-240.el8.x86_64', java.version: '11.0.13'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)\n\tat net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)\n\tat net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)\n\tat net.jodah.failsafe.Execution.executeSync(Execution.java:128)\n\tat net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)\n\tat net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)\n\tat org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)\n\t... 49 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "qaautoregexec02ind1.hqdev.india:4444","http.method": "POST","http.request_content_length": "96","http.scheme": "HTTP","http.target": "\u002fsession\u002f84f3ce1f70810c0878824c830b5fe366\u002felements","http.user_agent": "selenium\u002f3.14.0 (java windows)","session.id": "84f3ce1f70810c0878824c830b5fe366"}}

diemol commented 2 years ago

We are making some changes to the retry mechanisms since it might be hiding some issues. This will be part of 4.1.3

RavirajsinhZalaTR commented 2 years ago

We are making some changes to the retry mechanisms since it might be hiding some issues. This will be part of 4.1.3

Awesome, thank you @diemol , any rough time line for 4.1.3 release?

JulienBreton commented 2 years ago

Hi @diemol I am sorry to say that but with 4.1.3 the issue with the java.util.concurrent.TimeoutException is back. I have run many times my tests with 4.1.2 and I have not encountered the problem. With 4.1.3 I have the problem several times.

My config : Selenium Java 4.1.3 docker-selenium : 4.1.3-20220327

selenium-hub_1  | 09:01:40.184 INFO [LocalDistributor.newSession] - Session request received by the distributor: 
selenium-hub_1  |  [Capabilities {browserName: chrome, goog:chromeOptions: {args: [], extensions: []}}]
chrome_1        | Starting ChromeDriver 99.0.4844.51 (d537ec02474b5afe23684e7963d538896c63ac77-refs/branch-heads/4844@{#875}) on port 50633
chrome_1        | Only local connections are allowed.
chrome_1        | Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
chrome_1        | ChromeDriver was started successf[u1l6l4y8.6
chrome_1        | 30900.194][SEVERE]: bind() failed: Cannot assign requested address (99)
chrome_1        | 09:01:40.466 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
selenium-hub_1  | 09:01:40.475 INFO [LocalDistributor.newSession] - Session created by the distributor. Id: 92dd11e854c16327d4893c464dde0fb2, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 99.0.4844.84, chrome: {chromedriverVersion: 99.0.4844.51 (d537ec02474b5..., userDataDir: /tmp/.com.google.Chrome.aBn0OB}, goog:chromeOptions: {debuggerAddress: localhost:37185}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: {}, se:cdp: ws://172.18.0.4:4444/sessio..., se:cdpVersion: 99.0.4844.84, se:vnc: ws://172.18.0.4:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.18.0.4:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true}
selenium-hub_1  | 09:05:27.597 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "a1b01ed41d42ddfad0af45f2313d32cc","eventTime": 1648631127596733857,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.3', revision: '7b1ebf28ef'\nSystem info: host: '4541b7a89eeb', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.3', revision: '7b1ebf28ef'\nSystem info: host: '4541b7a89eeb', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:124)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:121)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)\n\t... 31 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "localhost:4444","http.method": "POST","http.request_content_length": "46","http.scheme": "HTTP","http.target": "\u002fsession\u002f92dd11e854c16327d4893c464dde0fb2\u002furl","http.user_agent": "selenium\u002f4.1.3 (java unix)","session.id": "92dd11e854c16327d4893c464dde0fb2"}}
selenium-hub_1  | 
chrome_1        | [1648631303.333][SEVERE]: Timed out receiving message from renderer: 299.728
chrome_1        | [1648631303.341][SEVERE]: Timed out receiving message from renderer: 299.728
selenium-hub_1  | 09:10:32.096 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: 92dd11e854c16327d4893c464dde0fb2
evertones commented 2 years ago

I can see the same exception mentioned by @JulienBreton. I am using:

Logs in the hub

12:48:51.090 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "8738dc89d6201cb04b52f555524193f7","eventTime": 1648691331088857077,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException
Build info: version: '4.1.3', revision: '7b1ebf28ef'
System info: host: 'selenium-hub-644bf6c986-5l8kv', ip: '10.32.128.0', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'
Driver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.1.3', revision: '7b1ebf28ef'
System info: host: 'selenium-hub-644bf6c986-5l8kv', ip: '10.32.128.0', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'
Driver info: driver.version: unknown
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:124)
    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
    at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:121)
    at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.grid.router.Router.execute(Router.java:91)
    at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
    at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
    at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
    at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
    at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
    at java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base\u002fjava.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
    at java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
    at java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
    ... 35 more
","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "ourserver.io:31002","http.method": "POST","http.request_content_length": "53","http.scheme": "HTTP","http.target": "\u002fsession\u002ff51005fbae4da8700f6635873d778f5c\u002felement","http.user_agent": "selenium\u002f4.1.3 (java unix)","session.id": "f51005fbae4da8700f6635873d778f5c"}}

Logs in the node (node-chrome)

chrome_1        | 07:34:59.520 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
chrome_1        | Starting ChromeDriver 99.0.4844.51 (d537ec02474b5afe23684e7963d538896c63ac77-refs/branch-heads/4844@{#875}) on port 57734
chrome_1        | Only local connections are allowed.
chrome_1        | Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
chrome_1        | [C1648672499.877][SEVEhromeDriver was started successfulRlEy]: bi.
chrome_1        | nd() failed: Cannot assign requested address (99)

Above I found really strange the messages in the last 2 lines to be messed up. It looks like a concurrency issue printing them out.

diemol commented 2 years ago

@JulienBreton @evertones In 4.1.3 we removed the behavior of having retries everywhere by default. So it is not strange to see the error again, but the advantage now is that with your stacktraces we see where the timeout is exactly happening.

Before I go and apply a fix, would it be possible to get more detailed logs? (with --log-level "fine")

JulienBreton commented 2 years ago

@diemol there are a lot of logs so I tried to select the part where I can find all the java.util.concurrent.TimeoutException I hope this is ok.

[35mselenium-hub_1  |[0m 13:15:38.974 DEBUG [SeleniumSpanExporter$1.lambda$export$1] - SpanData{spanContext=ImmutableSpanContext{traceId=ac490fe8e85123bfc603e2afcac6d1f8, spanId=d4209a4e5c951d1d, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=ac490fe8e85123bfc603e2afcac6d1f8, spanId=38957f5daeaaf83a, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.12.0"}}, instrumentationLibraryInfo=InstrumentationLibraryInfo{name=default, version=null, schemaUrl=null}, name=httpclient.execute, kind=INTERNAL, startEpochNanos=1648732358961000000, endEpochNanos=1648732538973663752, attributes=AttributesMap{data={span.kind=client, http.target=/session/c65b907a21ef351d4bad4ea2b8f36f15/timeouts, http.method=POST}, capacity=128, totalAddedValues=3}, totalAttributeCount=3, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
[35mselenium-hub_1  |[0m 13:15:38.974 DEBUG [SeleniumSpanExporter$1.lambda$export$1] - SpanData{spanContext=ImmutableSpanContext{traceId=ac490fe8e85123bfc603e2afcac6d1f8, spanId=d18a33e7986baf6f, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=ac490fe8e85123bfc603e2afcac6d1f8, spanId=38957f5daeaaf83a, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.12.0"}}, instrumentationLibraryInfo=InstrumentationLibraryInfo{name=default, version=null, schemaUrl=null}, name=reverse_proxy, kind=INTERNAL, startEpochNanos=1648732358961000000, endEpochNanos=1648732538973865753, attributes=AttributesMap{data={http.target=/session/c65b907a21ef351d4bad4ea2b8f36f15/timeouts, http.method=POST}, capacity=128, totalAddedValues=2}, totalAttributeCount=2, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
[35mselenium-hub_1  |[0m 13:15:38.978 DEBUG [SeleniumSpanExporter$1.lambda$export$1] - SpanData{spanContext=ImmutableSpanContext{traceId=ac490fe8e85123bfc603e2afcac6d1f8, spanId=38957f5daeaaf83a, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=ac490fe8e85123bfc603e2afcac6d1f8, spanId=dd7d5d283363aed5, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.12.0"}}, instrumentationLibraryInfo=InstrumentationLibraryInfo{name=default, version=null, schemaUrl=null}, name=router.handle_session, kind=INTERNAL, startEpochNanos=1648732358961000000, endEpochNanos=1648732538977871635, attributes=AttributesMap{data={error=true, http.target=/session/c65b907a21ef351d4bad4ea2b8f36f15/timeouts, http.method=POST, session.id=c65b907a21ef351d4bad4ea2b8f36f15}, capacity=128, totalAddedValues=4}, totalAttributeCount=4, events=[ImmutableEventData{name=exception, attributes={exception.message="Unable to execute request for an existing session: java.util.concurrent.TimeoutException
[35mselenium-hub_1  |[0m Build info: version: '4.1.3', revision: '7b1ebf28ef'
[35mselenium-hub_1  |[0m System info: host: 'b7c72dbf8cf7', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'
[35mselenium-hub_1  |[0m Driver info: driver.version: unknown", exception.stacktrace="org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
[35mselenium-hub_1  |[0m Build info: version: '4.1.3', revision: '7b1ebf28ef'
[35mselenium-hub_1  |[0m System info: host: 'b7c72dbf8cf7', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'
[35mselenium-hub_1  |[0m Driver info: driver.version: unknown
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:124)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:121)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.grid.router.Router.execute(Router.java:91)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
[35mselenium-hub_1  |[0m    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[35mselenium-hub_1  |[0m    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[35mselenium-hub_1  |[0m    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[35mselenium-hub_1  |[0m    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[35mselenium-hub_1  |[0m    at java.base/java.lang.Thread.run(Thread.java:829)
[35mselenium-hub_1  |[0m Caused by: java.util.concurrent.TimeoutException
[35mselenium-hub_1  |[0m    at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
[35mselenium-hub_1  |[0m    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
[35mselenium-hub_1  |[0m    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
[35mselenium-hub_1  |[0m    ... 31 more
[35mselenium-hub_1  |[0m ", exception.type="org.openqa.selenium.TimeoutException", http.flavor=1, http.handler_class="org.openqa.selenium.grid.router.HandleSession", http.host="localhost:4444", http.method="POST", http.request_content_length="22", http.scheme="HTTP", http.target="/session/c65b907a21ef351d4bad4ea2b8f36f15/timeouts", http.user_agent="selenium/4.1.3 (java unix)", session.id="c65b907a21ef351d4bad4ea2b8f36f15"}, epochNanos=1648732538977750285, totalAttributeCount=12}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: CANCELLED Description:}, hasEnded=true}
[35mselenium-hub_1  |[0m 13:15:38.978 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "ac490fe8e85123bfc603e2afcac6d1f8","eventTime": 1648732538977750285,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.3', revision: '7b1ebf28ef'\nSystem info: host: 'b7c72dbf8cf7', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.1.3', revision: '7b1ebf28ef'\nSystem info: host: 'b7c72dbf8cf7', ip: '172.18.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '5.4.0-105-generic', java.version: '11.0.14'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:124)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:121)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)\n\t... 31 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "localhost:4444","http.method": "POST","http.request_content_length": "22","http.scheme": "HTTP","http.target": "\u002fsession\u002fc65b907a21ef351d4bad4ea2b8f36f15\u002ftimeouts","http.user_agent": "selenium\u002f4.1.3 (java unix)","session.id": "c65b907a21ef351d4bad4ea2b8f36f15"}}
[35mselenium-hub_1  |[0m 
[35mselenium-hub_1  |[0m 13:15:38.979 DEBUG [LoggingHandler.channelRead] - [id: 0x2c505ba0, L:/0.0.0.0:4444] READ: [id: 0x8a385eb8, L:/172.18.0.2:4444 - R:/172.18.0.1:35100]
[35mselenium-hub_1  |[0m 13:15:38.979 DEBUG [LoggingHandler.channelReadComplete] - [id: 0x2c505ba0, L:/0.0.0.0:4444] READ COMPLETE
[35mselenium-hub_1  |[0m 13:15:38.985 DEBUG [RequestConverter.channelRead0] - Incoming message: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
[35mselenium-hub_1  |[0m DELETE /session/c65b907a21ef351d4bad4ea2b8f36f15 HTTP/1.1
[35mselenium-hub_1  |[0m User-Agent: selenium/4.1.3 (java unix)
[35mselenium-hub_1  |[0m traceparent: 00-378e999ad612f277dd3607c317d7cfa8-3982771b7bc34b7f-01
[35mselenium-hub_1  |[0m Content-Type: application/json; charset=utf-8
[35mselenium-hub_1  |[0m host: localhost:4444
[35mselenium-hub_1  |[0m accept: */*
[35mselenium-hub_1  |[0m 13:15:38.985 DEBUG [RequestConverter.channelRead0] - Start of http request: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
[35mselenium-hub_1  |[0m DELETE /session/c65b907a21ef351d4bad4ea2b8f36f15 HTTP/1.1
[35mselenium-hub_1  |[0m User-Agent: selenium/4.1.3 (java unix)
[35mselenium-hub_1  |[0m traceparent: 00-378e999ad612f277dd3607c317d7cfa8-3982771b7bc34b7f-01
[35mselenium-hub_1  |[0m Content-Type: application/json; charset=utf-8
[35mselenium-hub_1  |[0m host: localhost:4444
[35mselenium-hub_1  |[0m accept: */*
[35mselenium-hub_1  |[0m 13:15:38.985 DEBUG [HttpTracing.inject] - Injecting (DELETE) /session/c65b907a21ef351d4bad4ea2b8f36f15 into OpenTelemetrySpan{traceId=378e999ad612f277dd3607c317d7cfa8,spanId=72f896bb1c388e8f} at org.openqa.selenium.grid.router.HandleSession:120
[35mselenium-hub_1  |[0m 13:15:38.986 DEBUG [RequestConverter.channelRead0] - Incoming message: EmptyLastHttpContent
[35mselenium-hub_1  |[0m 13:15:38.987 DEBUG [RequestConverter.channelRead0] - Closing input pipe.
[35mselenium-hub_1  |[0m 13:15:38.988 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0x54733b25, L:/172.18.0.2:48696 - R:/172.18.0.5:5555]' for 'DELETE' to '/session/c65b907a21ef351d4bad4ea2b8f36f15'
[36mchrome_1        |[0m 13:15:38.988 DEBUG [LoggingHandler.channelRead] - [id: 0x46e726c6, L:/0.0.0.0:5555] READ: [id: 0xb3819b23, L:/172.18.0.5:5555 - R:/172.18.0.2:48696]
[36mchrome_1        |[0m 13:15:38.988 DEBUG [LoggingHandler.channelReadComplete] - [id: 0x46e726c6, L:/0.0.0.0:5555] READ COMPLETE
[36mchrome_1        |[0m 13:15:38.992 DEBUG [RequestConverter.channelRead0] - Incoming message: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
[36mchrome_1        |[0m DELETE /session/c65b907a21ef351d4bad4ea2b8f36f15 HTTP/1.1
[36mchrome_1        |[0m User-Agent: selenium/4.1.3 (java unix)
[36mchrome_1        |[0m Connection: keep-alive
[36mchrome_1        |[0m accept: */*
[36mchrome_1        |[0m Cache-Control: no-cache
[36mchrome_1        |[0m traceparent: 00-378e999ad612f277dd3607c317d7cfa8-f2e10dfac1529240-01
[36mchrome_1        |[0m host: localhost:4444
[36mchrome_1        |[0m Content-Type: application/json; charset=utf-8
[36mchrome_1        |[0m 13:15:38.992 DEBUG [RequestConverter.channelRead0] - Start of http request: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
[36mchrome_1        |[0m DELETE /session/c65b907a21ef351d4bad4ea2b8f36f15 HTTP/1.1
[36mchrome_1        |[0m User-Agent: selenium/4.1.3 (java unix)
[36mchrome_1        |[0m Connection: keep-alive
[36mchrome_1        |[0m accept: */*
[36mchrome_1        |[0m Cache-Control: no-cache
[36mchrome_1        |[0m traceparent: 00-378e999ad612f277dd3607c317d7cfa8-f2e10dfac1529240-01
[36mchrome_1        |[0m host: localhost:4444
[36mchrome_1        |[0m Content-Type: application/json; charset=utf-8
[36mchrome_1        |[0m 13:15:38.993 DEBUG [RequestConverter.channelRead0] - Incoming message: EmptyLastHttpContent
[36mchrome_1        |[0m 13:15:38.993 DEBUG [RequestConverter.channelRead0] - Closing input pipe.
[36mchrome_1        |[0m 13:15:38.993 DEBUG [SpanWrappedHttpHandler.execute] - Wrapping request. Before OpenTelemetryContext{tracer=io.opentelemetry.sdk.trace.SdkTracer@2199485, context={}, span id=0000000000000000, trace id=00000000000000000000000000000000} and after OpenTelemetryContext{tracer=io.opentelemetry.sdk.trace.SdkTracer@2199485, context={opentelemetry-trace-span-key=SdkSpan{traceId=378e999ad612f277dd3607c317d7cfa8, spanId=bbf755505f52a7af, parentSpanContext=ImmutableSpanContext{traceId=378e999ad612f277dd3607c317d7cfa8, spanId=f2e10dfac1529240, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, name=node.forward_command, kind=INTERNAL, attributes=AttributesMap{data={random.key=9d142ca0-6618-4a03-b936-86e856ac6bd6}, capacity=128, totalAddedValues=1}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1648732538993000000, endEpochNanos=0}}, span id=bbf755505f52a7af, trace id=378e999ad612f277dd3607c317d7cfa8}
[36mchrome_1        |[0m 13:15:38.993 DEBUG [HttpTracing.inject] - Injecting (DELETE) /session/c65b907a21ef351d4bad4ea2b8f36f15 into OpenTelemetrySpan{traceId=378e999ad612f277dd3607c317d7cfa8,spanId=bbf755505f52a7af} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84
[36mchrome_1        |[0m 13:15:38.997 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0x0be84600, L:/127.0.0.1:50260 - R:localhost/127.0.0.1:45386]' for 'DELETE' to '/session/c65b907a21ef351d4bad4ea2b8f36f15'
[35mselenium-hub_1  |[0m 13:15:39.011 DEBUG [TimeoutTimerTask.expire] - Request timeout to 172.18.0.5/172.18.0.5:5555 after 180000 ms for NettyResponseFuture{currentRetry=0,
[35mselenium-hub_1  |[0m    isDone=0,
[35mselenium-hub_1  |[0m    isCancelled=0,
[35mselenium-hub_1  |[0m    asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@2517561a,
[35mselenium-hub_1  |[0m    nettyRequest=org.asynchttpclient.netty.request.NettyRequest@7b27914d,
[35mselenium-hub_1  |[0m    future=java.util.concurrent.CompletableFuture@3207f4ff[Not completed],
[35mselenium-hub_1  |[0m    uri=http://172.18.0.5:5555/session/c65b907a21ef351d4bad4ea2b8f36f15/timeouts,
[35mselenium-hub_1  |[0m    keepAlive=true,
[35mselenium-hub_1  |[0m    redirectCount=0,
[35mselenium-hub_1  |[0m    timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@3ee1a9fc,
[35mselenium-hub_1  |[0m    inAuth=0,
[35mselenium-hub_1  |[0m    touch=1648732358974} after 180045 ms
[35mselenium-hub_1  |[0m 13:15:39.011 DEBUG [ChannelManager.closeChannel] - Closing Channel [id: 0x3f60c7f8, L:/172.18.0.2:47884 - R:/172.18.0.5:5555] 
[35mselenium-hub_1  |[0m 13:15:39.011 DEBUG [NettyRequestSender.abort] - Aborting Future NettyResponseFuture{currentRetry=0,
[35mselenium-hub_1  |[0m    isDone=0,
[35mselenium-hub_1  |[0m    isCancelled=0,
[35mselenium-hub_1  |[0m    asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@2517561a,
[35mselenium-hub_1  |[0m    nettyRequest=org.asynchttpclient.netty.request.NettyRequest@7b27914d,
[35mselenium-hub_1  |[0m    future=java.util.concurrent.CompletableFuture@3207f4ff[Not completed],
[35mselenium-hub_1  |[0m    uri=http://172.18.0.5:5555/session/c65b907a21ef351d4bad4ea2b8f36f15/timeouts,
[35mselenium-hub_1  |[0m    keepAlive=true,
[35mselenium-hub_1  |[0m    redirectCount=0,
[35mselenium-hub_1  |[0m    timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@3ee1a9fc,
[35mselenium-hub_1  |[0m    inAuth=0,
[35mselenium-hub_1  |[0m    touch=1648732358974}
[35mselenium-hub_1  |[0m 
[35mselenium-hub_1  |[0m 13:15:39.011 DEBUG [NettyRequestSender.abort] - Request timeout to 172.18.0.5/172.18.0.5:5555 after 180000 ms
[35mselenium-hub_1  |[0m java.util.concurrent.TimeoutException: Request timeout to 172.18.0.5/172.18.0.5:5555 after 180000 ms
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[35mselenium-hub_1  |[0m    at java.base/java.lang.Thread.run(Thread.java:829)
[35mselenium-hub_1  |[0m 13:15:39.012 DEBUG [AsyncCompletionHandler.onThrowable] - Request timeout to 172.18.0.5/172.18.0.5:5555 after 180000 ms
[35mselenium-hub_1  |[0m java.util.concurrent.TimeoutException: Request timeout to 172.18.0.5/172.18.0.5:5555 after 180000 ms
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[35mselenium-hub_1  |[0m    at java.base/java.lang.Thread.run(Thread.java:829)
[35mselenium-hub_1  |[0m 13:15:39.012 DEBUG [AsyncHttpClientHandler.channelInactive] - Channel Closed: [id: 0x3f60c7f8, L:/172.18.0.2:47884 ! R:/172.18.0.5:5555] with attribute DISCARD
AndreasPetersen commented 2 years ago

We're seeing the same issue. I've seen it happen on different clicks. Sometimes it just works, other times it doesn't. Here's an example stack trace (I've censored some things):

Build info: version: '4.1.3', revision: '7b1ebf28ef'
System info: host: 'my-computer', ip: 'my-ip', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.12'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [44cc0527f0a1c0ef57a0f5a1f070a131, clickElement {id=5539a100-dacc-45be-aa37-bef50bf4c90c}]
Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 99.0.4844.84, chrome: {chromedriverVersion: 99.0.4844.51 (d537ec02474b5..., userDataDir: /tmp/.com.google.Chrome.JNCn2F}, goog:chromeOptions: {debuggerAddress: localhost:40533}, goog:loggingPrefs: {browser: ALL}, javascriptEnabled: true, networkConnectionEnabled: false, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, proxy: Proxy(), se:cdp: ws://100.78.17.6:4444/sessi..., se:cdpVersion: 99.0.4844.84, se:vnc: ws://100.78.17.6:4444/sessi..., se:vncEnabled: true, se:vncLocalAddress: ws://100.78.17.6:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true}
Element: [[RemoteWebDriver: chrome on LINUX (44cc0527f0a1c0ef57a0f5a1f070a131)] -> xpath: my-xpath
Session ID: 44cc0527f0a1c0ef57a0f5a1f070a131
    at app//org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)
    at app//org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at app//org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at app//org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
    at app//org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
    at app//org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
    at app//org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:124)
    at app//org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
    at app//org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:181)
    at app//org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
    at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:567)
    at app//org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:251)
    at app//org.openqa.selenium.remote.RemoteWebElement.click(RemoteWebElement.java:77)
    at app//a.clickOnElement(a.java:142)
    at app//a.clickOnElementLocated(a.java:179)
    at app//b.n(b.java:120)
    at app//c.m(c.java:60)
    ... 25 more
Caused by: java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
    at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
    at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
    at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
    ... 41 more
evertones commented 2 years ago

@diemol : Is there any progress on this issue? In your previous comment you said that now there won't be retries, so showing the error is not strange, but... is there any workaround or fix expected?

diemol commented 2 years ago

Yes, based on the feedback we added the retry in a specific part of the code (in the Router). This is already out in 4.1.4, please check and let us know if it helped.

JulienBreton commented 2 years ago

@diemol I can reproduce the java.util.concurrent.TimeoutException with the 4.1.4

I add a file with more logs. logs_selenium_4.1.4_concurrent.TimeoutException.txt

[35mselenium-hub_1  |[0m 12:48:03.980 DEBUG [TimeoutTimerTask.expire] - Request timeout to 172.18.0.4/172.18.0.4:5555 after 180000 ms for NettyResponseFuture{currentRetry=0,
[35mselenium-hub_1  |[0m    isDone=0,
[35mselenium-hub_1  |[0m    isCancelled=0,
[35mselenium-hub_1  |[0m    asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@1816fe4a,
[35mselenium-hub_1  |[0m    nettyRequest=org.asynchttpclient.netty.request.NettyRequest@34ea0614,
[35mselenium-hub_1  |[0m    future=java.util.concurrent.CompletableFuture@248895c5[Not completed],
[35mselenium-hub_1  |[0m    uri=http://172.18.0.4:5555/session/f24ae2d1ba0f0d398cdaa093ef912389/timeouts,
[35mselenium-hub_1  |[0m    keepAlive=true,
[35mselenium-hub_1  |[0m    redirectCount=0,
[35mselenium-hub_1  |[0m    timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@33934e63,
[35mselenium-hub_1  |[0m    inAuth=0,
[35mselenium-hub_1  |[0m    touch=1651149903910} after 180071 ms
[35mselenium-hub_1  |[0m 12:48:03.980 DEBUG [ChannelManager.closeChannel] - Closing Channel [id: 0x996a669e, L:/172.18.0.2:60948 - R:/172.18.0.4:5555] 
[35mselenium-hub_1  |[0m 12:48:03.980 DEBUG [NettyRequestSender.abort] - Aborting Future NettyResponseFuture{currentRetry=0,
[35mselenium-hub_1  |[0m    isDone=0,
[35mselenium-hub_1  |[0m    isCancelled=0,
[35mselenium-hub_1  |[0m    asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@1816fe4a,
[35mselenium-hub_1  |[0m    nettyRequest=org.asynchttpclient.netty.request.NettyRequest@34ea0614,
[35mselenium-hub_1  |[0m    future=java.util.concurrent.CompletableFuture@248895c5[Not completed],
[35mselenium-hub_1  |[0m    uri=http://172.18.0.4:5555/session/f24ae2d1ba0f0d398cdaa093ef912389/timeouts,
[35mselenium-hub_1  |[0m    keepAlive=true,
[35mselenium-hub_1  |[0m    redirectCount=0,
[35mselenium-hub_1  |[0m    timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@33934e63,
[35mselenium-hub_1  |[0m    inAuth=0,
[35mselenium-hub_1  |[0m    touch=1651149903910}
[35mselenium-hub_1  |[0m 
[35mselenium-hub_1  |[0m 12:48:03.980 DEBUG [NettyRequestSender.abort] - Request timeout to 172.18.0.4/172.18.0.4:5555 after 180000 ms
[35mselenium-hub_1  |[0m java.util.concurrent.TimeoutException: Request timeout to 172.18.0.4/172.18.0.4:5555 after 180000 ms
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[35mselenium-hub_1  |[0m    at java.base/java.lang.Thread.run(Thread.java:829)
[35mselenium-hub_1  |[0m 12:48:03.980 DEBUG [AsyncHttpClientHandler.channelInactive] - Channel Closed: [id: 0x996a669e, L:/172.18.0.2:60948 ! R:/172.18.0.4:5555] with attribute DISCARD
[36mchrome_1        |[0m 12:48:03.980 DEBUG [RequestConverter.channelInactive] - Closing input pipe, channel became inactive.
[35mselenium-hub_1  |[0m 12:48:03.980 DEBUG [AsyncCompletionHandler.onThrowable] - Request timeout to 172.18.0.4/172.18.0.4:5555 after 180000 ms
[35mselenium-hub_1  |[0m java.util.concurrent.TimeoutException: Request timeout to 172.18.0.4/172.18.0.4:5555 after 180000 ms
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
[35mselenium-hub_1  |[0m    at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
[35mselenium-hub_1  |[0m    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
[35mselenium-hub_1  |[0m    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[35mselenium-hub_1  |[0m    at java.base/java.lang.Thread.run(Thread.java:829)
bachhavdipak commented 2 years ago

Able to reproduce this issue with 4.1.4

2022-05-04T21:42:04.7949787Z     org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
2022-05-04T21:42:04.7953154Z     Build info: version: '4.1.3', revision: '7b1ebf28ef'
2022-05-04T21:42:04.7954198Z     System info: host: 'fv-az209-275', ip: '10.1.0.17', os.name: 'Linux', os.arch: 'amd64', os.version: '5.13.0-1022-azure', java.version: '11.0.14.1'
2022-05-04T21:42:04.7955041Z     Driver info: org.openqa.selenium.remote.RemoteWebDriver
2022-05-04T21:42:04.7955490Z     Command: [20e886e3b927b85ddf0a98490364e8cc, getCurrentUrl {}]
2022-05-04T21:42:04.7961680Z     Session ID: 20e886e3b927b85ddf0a98490364e8cc
2022-05-04T21:42:04.7962473Z         at app//org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:72)
2022-05-04T21:42:04.7963182Z         at app//org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
2022-05-04T21:42:04.7963867Z         at app//org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
2022-05-04T21:42:04.7964512Z         at app//org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
2022-05-04T21:42:04.7965215Z         at app//org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
2022-05-04T21:42:04.7965895Z         at app//org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
2022-05-04T21:42:04.7966519Z         at app//org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:124)
2022-05-04T21:42:04.7967185Z         at app//org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
2022-05-04T21:42:04.7967854Z         at app//org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:181)
2022-05-04T21:42:04.7968537Z         at app//org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
2022-05-04T21:42:04.7969184Z         at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:567)
2022-05-04T21:42:04.7969823Z         at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:622)
2022-05-04T21:42:04.7970457Z         at app//org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:626)
2022-05-04T21:42:04.7971114Z         at app//org.openqa.selenium.remote.RemoteWebDriver.getCurrentUrl(RemoteWebDriver.java:342)
2022-05-04T21:42:04.7971815Z         at app//net.thucydides.core.webdriver.WebdriverProxyFactory.clearBrowserSession(WebdriverProxyFactory.java:121)
2022-05-04T21:42:04.7972526Z         at java.base@11.0.14.1/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
2022-05-04T21:42:04.7973223Z         at java.base@11.0.14.1/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
2022-05-04T21:42:04.7973906Z         at java.base@11.0.14.1/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
2022-05-04T21:42:04.7974614Z         at java.base@11.0.14.1/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
2022-05-04T21:42:04.7975294Z         at java.base@11.0.14.1/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
2022-05-04T21:42:04.7976086Z         at java.base@11.0.14.1/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
2022-05-04T21:42:04.7976784Z         at java.base@11.0.14.1/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
2022-05-04T21:42:04.7977515Z         at java.base@11.0.14.1/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
2022-05-04T21:42:04.7978218Z         at java.base@11.0.14.1/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
2022-05-04T21:42:04.7978881Z         at java.base@11.0.14.1/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
2022-05-04T21:42:04.7979629Z         at app//net.serenitybdd.junit5.extension.page.SerenityPageExtension.prepareBrowserForTest(SerenityPageExtension.java:116)
2022-05-04T21:42:04.7980397Z         at app//net.serenitybdd.junit5.extension.page.SerenityPageExtension.beforeEach(SerenityPageExtension.java:48)
2022-05-04T21:42:04.7981219Z         at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeEachCallbacks$1(TestMethodTestDescriptor.java:159)
2022-05-04T21:42:04.7982157Z         at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeBeforeMethodsOrCallbacksUntilExceptionOccurs$5(TestMethodTestDescriptor.java:195)
2022-05-04T21:42:04.7983037Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.7983960Z         at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeMethodsOrCallbacksUntilExceptionOccurs(TestMethodTestDescriptor.java:195)
2022-05-04T21:42:04.7984867Z         at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeBeforeEachCallbacks(TestMethodTestDescriptor.java:158)
2022-05-04T21:42:04.7985659Z         at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:125)
2022-05-04T21:42:04.7986422Z         at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:65)
2022-05-04T21:42:04.7987183Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
2022-05-04T21:42:04.7987955Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.7988717Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
2022-05-04T21:42:04.7989429Z         at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
2022-05-04T21:42:04.7990127Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
2022-05-04T21:42:04.7990900Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.7991640Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
2022-05-04T21:42:04.7992748Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
2022-05-04T21:42:04.7993623Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
2022-05-04T21:42:04.7994831Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.executeNonConcurrentTasks(ForkJoinPoolHierarchicalTestExecutorService.java:155)
2022-05-04T21:42:04.7995862Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.invokeAll(ForkJoinPoolHierarchicalTestExecutorService.java:135)
2022-05-04T21:42:04.7996765Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
2022-05-04T21:42:04.7997624Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.7998399Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
2022-05-04T21:42:04.7999094Z         at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
2022-05-04T21:42:04.7999811Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
2022-05-04T21:42:04.8000567Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.8001302Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
2022-05-04T21:42:04.8002025Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
2022-05-04T21:42:04.8002887Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
2022-05-04T21:42:04.8003955Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.executeNonConcurrentTasks(ForkJoinPoolHierarchicalTestExecutorService.java:155)
2022-05-04T21:42:04.8005054Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService.invokeAll(ForkJoinPoolHierarchicalTestExecutorService.java:135)
2022-05-04T21:42:04.8005952Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
2022-05-04T21:42:04.8006710Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.8007486Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
2022-05-04T21:42:04.8008319Z         at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
2022-05-04T21:42:04.8009041Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
2022-05-04T21:42:04.8009798Z         at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-05-04T21:42:04.8010554Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
2022-05-04T21:42:04.8011260Z         at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
2022-05-04T21:42:04.8012134Z         at app//org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:185)
2022-05-04T21:42:04.8012988Z         at java.base@11.0.14.1/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
2022-05-04T21:42:04.8013642Z         at java.base@11.0.14.1/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
2022-05-04T21:42:04.8014313Z         at java.base@11.0.14.1/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
2022-05-04T21:42:04.8014983Z         at java.base@11.0.14.1/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
2022-05-04T21:42:04.8015637Z         at java.base@11.0.14.1/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
2022-05-04T21:42:04.8016309Z         at java.base@11.0.14.1/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
2022-05-04T21:42:04.8016665Z 
2022-05-04T21:42:04.8016892Z         Caused by:
2022-05-04T21:42:04.8017199Z         java.util.concurrent.TimeoutException
2022-05-04T21:42:04.8017728Z             at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
2022-05-04T21:42:04.8018450Z             at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
2022-05-04T21:42:04.8019106Z             at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
2022-05-04T21:42:04.8019781Z             at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)
2022-05-04T21:42:04.8020276Z             ... 70 more
amyreit commented 2 years ago

Just another piece of information - I moved from 4.0 to 4.1.2 standalone firefox in early April and the occurrence of the issue has dropped significantly. From several runs a day to once every week.

JulienBreton commented 2 years ago

@amyreit in the 4.1.2 there is a temporary fix but this fix is removed since 4.1.3 for a better way to solve the issue. The issue occurs always in 4.1.4, then it seems the team needs to add a fix (maybe it is already the case in the trunk).

diemol commented 2 years ago

@bachhavdipak your stacktrace shows a timeout on the client side, which you can tune by passing a ClientConfig to the RemoteWebDriverBuilder.

diemol commented 2 years ago

@JulienBreton I was checking the attached log and I am confused about how the test is sending those commands. I followed the command for session f24ae2d1ba0f0d398cdaa093ef912389...

But then...

It seems our HTTP client is doing some extra retries we were not aware of (NettyRequestSender.retry) which I will deactivate in a moment. In this case, those automatic retries are happening after the session has been deleted. And yes, we need to switch to another HTTP client soon.

Deactivating those automatic retries should reduce or remove the amount of timeouts happening, and the let us rely on the configured session-timeout plus the retries that only happen between Hub and Node. I think we will do a new release in less than 1 week.

ApexK commented 2 years ago

@diemol Any update on the next release which contains the above fix?

titusfortner commented 2 years ago

Next release is likely going to be 4.2, which means it needs to be coordinated across all the languages. Hopefully soon, but no promises.

C0ding-4Fun commented 2 years ago

Workaround? In my test I'm getting data from one website and interacting with another using the information from the first site. The .get(URL) kept locking up with the reported error. I created two remote drivers, one for each website and it's been rock solid ever since. Hope it helps someone.