SeleniumHQ / docker-selenium

Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Docker, making it easier to perform browser automation
http://www.selenium.dev/docker-selenium/
Other
7.96k stars 2.51k forks source link

[🐛 Bug]: Error while creating session, because of java.util.concurrent.TimeoutException #2373

Closed zsbutosi closed 3 weeks ago

zsbutosi commented 2 months ago

What happened?

Hi!

We have been using selenium grid docker image in kubernetes setup for almost 2 years now. But since the 4.23.0 version, we get org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Response code 500. Message: Could not start a new session. java.util.concurrent.TimeoutException errors. I can reproduce it locally with docker container too. I have tried the docker image 4.23.0-20240727, 4.23.1-20240820 and the nightly build too, with the same result.

For testing framework, we are using selenide, with kotlin and java21. Currently we are using selenide version 7.4.2, which is using selenium 4.23.1. The browser config for chrome is the following:

            addArguments("--no-sandbox")
            addArguments("--disable-dev-shm-usage")
            addArguments("--disable-search-engine-choice-screen")
            addArguments("--unsafely-disable-devtools-self-xss-warnings")
            addArguments("window-size=1920,1080")

and we also set Configuration.headless as true for the selenide config.

Command used to start Selenium Grid with Docker (or Kubernetes)

The compose file is this:

  chrome:
    image: selenium/node-chromium:4.23.1-20240820
    shm_size: 3gb
    depends_on:
      - selenium-hub
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_MAX_SESSIONS=4
      - SE_NODE_OVERRIDE_MAX_SESSIONS=true
      - SE_START_XVFB=false
      - SE_START_VNC=false
      - SE_START_NO_VNC=false
      - SE_NODE_SESSION_TIMEOUT=30
      - SE_SCREEN_WIDTH=1920
      - SE_SCREEN_HEIGHT=1080
      - SE_ENABLE_TRACING=false
      - SE_STRUCTURED_LOGS=false
      - SE_ENABLE_TLS=false
      - SE_SUPERVISORD_LOG_LEVEL=info
      - SE_HTTP_LOGS=false

  selenium-hub:
    image: selenium/hub:4.23.1-20240820
    container_name: selenium-hub
    environment:
      - SE_SESSION_REQUEST_TIMEOUT=90
      - SE_SESSION_RETRY_INTERVAL=10
      - SE_ENABLE_TRACING=false
      - SE_ENABLE_TLS=false
      - SE_SUPERVISORD_LOG_LEVEL=debug
      - SE_STRUCTURED_LOGS=false
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"

### Relevant log output

```shell
From node-chormium:

2024-08-28 17:32:05 15:32:05.505 WARN [DriverServiceSessionFactory.apply] - Error while creating session with the driver service. Stopping driver service: java.util.concurrent.TimeoutException
2024-08-28 17:32:05 Build info: version: '4.23.1', revision: '656257d8e9'
2024-08-28 17:32:05 System info: os.name: 'Linux', os.arch: 'aarch64', os.version: '6.10.0-linuxkit', java.version: '17.0.12'
2024-08-28 17:32:05 Driver info: driver.version: unknown
2024-08-28 17:32:05 org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
2024-08-28 17:32:05 Build info: version: '4.23.1', revision: '656257d8e9'
2024-08-28 17:32:05 System info: os.name: 'Linux', os.arch: 'aarch64', os.version: '6.10.0-linuxkit', java.version: '17.0.12'
2024-08-28 17:32:05 Driver info: driver.version: unknown
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:418)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:374)
2024-08-28 17:32:05     at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)
2024-08-28 17:32:05     at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:89)
2024-08-28 17:32:05     at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:75)
2024-08-28 17:32:05     at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:61)
2024-08-28 17:32:05     at org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:162)
2024-08-28 17:32:05     at org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:71)
2024-08-28 17:32:05     at org.openqa.selenium.grid.node.local.SessionSlot.apply(SessionSlot.java:147)
2024-08-28 17:32:05     at org.openqa.selenium.grid.node.local.LocalNode.newSession(LocalNode.java:469)
2024-08-28 17:32:05     at org.openqa.selenium.grid.node.NewNodeSession.execute(NewNodeSession.java:50)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
2024-08-28 17:32:05     at org.openqa.selenium.grid.security.RequiresSecretFilter.lambda$apply$0(RequiresSecretFilter.java:62)
2024-08-28 17:32:05     at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:87)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
2024-08-28 17:32:05     at org.openqa.selenium.grid.node.Node.execute(Node.java:270)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
2024-08-28 17:32:05     at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
2024-08-28 17:32:05     at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
2024-08-28 17:32:05     at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
2024-08-28 17:32:05     at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
2024-08-28 17:32:05     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
2024-08-28 17:32:05     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
2024-08-28 17:32:05     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2024-08-28 17:32:05     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2024-08-28 17:32:05     at java.base/java.lang.Thread.run(Thread.java:840)
2024-08-28 17:32:05 Caused by: java.util.concurrent.TimeoutException
2024-08-28 17:32:05     at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1960)
2024-08-28 17:32:05     at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2095)
2024-08-28 17:32:05     at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:401)
2024-08-28 17:32:05     ... 33 more

Previous attempts with log-level FINE:

2024-08-27 09:35:54 07:35:54.244 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.pauseReadEvent] - [HttpClient-12-SelectorManager] [67s 233ms] SocketTube(12) pausing read event
2024-08-27 09:35:54 07:35:54.244 DEBUG [HttpClientImpl$SelectorAttachment.register] - [HttpClient-12-SelectorManager] [67s 233ms] SelectorAttachment Registering jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent@ed18cc4 for 0 (false)
2024-08-27 09:35:54 07:35:54.244 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-12-SelectorManager] [67s 233ms] SocketTube(12) completing subscriber
2024-08-27 09:35:54 07:35:54.244 DEBUG [ConnectionPool.cleanup] - [HttpClient-12-SelectorManager] [67s 233ms] ConnectionPool(12) SocketTube(12) : ConnectionPool.cleanup(null)
2024-08-27 09:35:54 07:35:54.244 DEBUG [PlainHttpConnection.close] - [HttpClient-12-SelectorManager] [67s 233ms] PlainHttpConnection(SocketTube(12)) Closing channel: channel registered with selector, key.interestOps=1, sa.interestOps=1
2024-08-27 09:35:54 07:35:54.245 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) signal read error: java.io.IOException: connection closed locally
2024-08-27 09:35:54 07:35:54.245 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) got read error: java.io.IOException: connection closed locally
2024-08-27 09:35:54 07:35:54.245 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) Stopping read scheduler
2024-08-27 09:35:54 07:35:54.245 DEBUG [SocketTube.debugState] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) leaving read() loop after EOF:  Reading: [ops=0, demand=0, stopped=true], Writing: [ops=0, demand=1]
2024-08-27 09:35:54 07:35:54.245 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) Read scheduler stopped
2024-08-27 09:35:54 07:35:54.245 DEBUG [HttpClientImpl$SelectorAttachment.resetInterestOps] - [HttpClient-12-SelectorManager] [67s 234ms] SelectorAttachment key cancelled for java.nio.channels.SocketChannel[closed]
2024-08-27 09:35:54 07:35:54.245 DEBUG [ExternalProcess$Builder.lambda$start$0] - completed to copy the output of process 1675
2024-08-27 09:35:54 07:35:54.245 DEBUG [SocketTube$SocketFlowEvent.abort] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) ReadEvent abort: java.io.IOException: java.nio.channels.CancelledKeyException
2024-08-27 09:35:54 07:35:54.246 DEBUG [SocketTube$InternalReadPublisher$ReadEvent.signalError] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) signalError to jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription@691c2895 (java.io.IOException: java.nio.channels.CancelledKeyException)
2024-08-27 09:35:54 07:35:54.246 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-12-SelectorManager] [67s 234ms] SocketTube(12) signal read error: java.io.IOException: java.nio.channels.CancelledKeyException

Operating System

Ubuntu

Docker Selenium version (image tag)

4.23.1-20240820

Selenium Grid chart version (chart version)

N/A

github-actions[bot] commented 2 months ago

@zsbutosi, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

VietND96 commented 1 month ago

we also set Configuration.headless as true for the selenide config

I saw headless is mentioned, but I am not sure selenide pass which argument? --headless or --headless=new You can try to disable that configure and use addArguments("--headless=new") to see any help

zsbutosi commented 1 month ago

@VietND96 Thank you for the response. I checked the selenide code and its sending --headless=new. And I also made changes, and now I am adding the --headless=new argument to the chrome configuration from my code. But unfortunately I am getting the same timeout exception. I tested locally with the 4.24.0-20240907 release.

As I see, the problem is when I try to run multiple browser sessions in one chrome-node. As long as I am running only 1 test, i don't see any problem. But if multiple tests are executed in parallel, the first one got passed, and the other tests got this timeout exception. Like if after closing the first session, it shuts down something in the background, which prevents any other session to start or work properly.

zsbutosi commented 3 weeks ago

@VietND96 I think I found a solution. Now I am using the latest docker images for hub and chrome-node (tag: 4.25.0-20240922). When a session starts, it spawn multiple chrome processes. And one of them took an entire core in processing. That is why I got the timeout exceptions. After quite a few trial I reenabled the xvfb, because that was turned off by default, as we always use headless mode in GRID. And the timeouts resolved. I think for one of the chrome process, xvfb is required.

So currently it seems to be working with the following env variables: SE_START_XVFB - true SE_START_VNC - false SE_START_NO_VNC - false SE_NODE_MAX_SESSIONS - 4 SE_NODE_OVERRIDE_MAX_SESSIONS - true

So it seems that xvfb is a must have to use GRID.

Also an other weird thing is that the chrome-node info in GRID states the following:

{"browserName":"chrome","browserVersion":"129.0","goog:chromeOptions":{"binary":"/usr/bin/google-chrome"},"platformName":"linux","se:containerName":"","se:noVncPort":7900,"se:vncEnabled":true}

But as I said before, vnc and novnc is disabled.

Thx

VietND96 commented 3 weeks ago

Thanks for your feedback, I think a note on README also mentioned a similar issue https://github.com/SeleniumHQ/docker-selenium?tab=readme-ov-file#headless Regarding caps "se:vncEnabled":true is present when SE_START_VNC and SE_START_NO_VNC are false, let me check again.