Closed bhecquet closed 1 year ago
@bhecquet, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
I've got a similar issue. Environment: Grid deployed with Helm (4.8.1 - fully distributed - no extra config) - 2 chrome nodes.
5 concurrent tests, each test has a 200 second delay. result: 1 failure
response = {'status': 500, 'value': '{\n "value": {\n "error": "timeout",\n "message": "java.net.http.HttpTimeoutException...n\\tat org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:292)\\n\\t... 36 more\\n"\n }\n}'}
[...]
=========================== short test summary info ============================
FAILED selenium-test.py::test_qahss03_4 - selenium.common.exceptions.TimeoutE...
=================== 1 failed, 4 passed in 404.93s (0:06:44) ====================
To avoid the timeout, I add the following configuration
extraEnvironmentVariables:
- name: SE_SESSION_REQUEST_TIMEOUT
value: "1200"
- name: SE_SESSION_RETRY_INTERVAL
value: "5"`
result: 4 tests PASSED, no response is received for the 5th test and the test suite never completes
All our test suites that use concurrent tests are affected by timeout issues:
[TestNG] org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Response code 500. Message: java.util.concurrent.TimeoutException
An other information that may help to resolve this bug: if I remove the "--session-request-timeout 30" parameter from hub, I don't have the bug anymore. But this is not a solution for me as I have 2 hubs in parallel (for redundancy purpose) and I don't want my new session request being stuck on one of them
I do not see the issue. There are three timeouts into play here:
@bhecquet In your case, you are setting 30 seconds for the session request queue timeout, therefore any new session request will stave because you are leaving the running sessions idle and waiting for the session timeout to happen.
I recommend to check the different flags for configuration here https://www.selenium.dev/documentation/grid/configuration/cli_options/, also the first steps here https://www.selenium.dev/documentation/grid/getting_started/
If you have more questions, I will leave some links below you can use to join our user chat community.
💬 Please ask questions at:
Hello,
thanks for your reply. I think you missed something: I do not let the session idle, in the problematic case, the client request is "aborted" whereas, at the same time, the session is created on the node, so, there is no way to use the created session
The reason is the same, you need to figure out what timeouts work for you. If the client times out before the queue times out, then you will have orphan sessions. That is why we left it configurable and not hard coded.
So I do not understand why I get a staled session on the node, corresponding to a client request which has already timeout (after 30 secs only).
Tell me if I understand correctly the process:
Is there a moment where session is still in session queue and also created on node ?
That seems correct, but what you are seeing:
Cannot create driver: Could not start a new session. Response code 500. Message: Could not start a new session. New session request timed out
is because the session request was in the queue for over 30 seconds and no Node slot was released in the meantime.
Yes, I agree with that, but, in my test code (provided in the issue) I set a custom capability ("cs:test
Cannot create driver: Could not start a new session. Response code 500. Message: Could not start a new session. New session request timed out
That's what surprise me
I do not think you can correlate that because you are executing a loop, 10 times, for each test. So you do not have a single request with cs:test3
, you are actually creating 10.
caps.setCapability("cs:test", String.format(name + "-" + i));
I simplified :)
the capability holds the test name and the loop index
I looked at the "LocalNewSessionQueue" code and see that while distributor takes effectively the session request to create it, it's removed from the queue, so theorically, what I observe is not possible :(
I will do more tests to understand better what happens and let you know
Thanks for your time
Hello @diemol
I think I've found the root cause of my problem and it's in LocalNewSessionQueue class
In fact, when session request arrives, a wait is performed https://github.com/SeleniumHQ/selenium/blob/440a388d0bdf0e3b6f977a5048c591d3865c02e9/java/src/org/openqa/selenium/grid/sessionqueue/local/LocalNewSessionQueue.java#L199
During this wait, LocalDistributor calls https://github.com/SeleniumHQ/selenium/blob/440a388d0bdf0e3b6f977a5048c591d3865c02e9/java/src/org/openqa/selenium/grid/sessionqueue/local/LocalNewSessionQueue.java#L304 to get sessions in queue
But nothing prevents the session request to timeout while the session is being created
I've added more logs to grid code (whole file attached) logs_reduced.log
08:48:44.243 INFO [LocalNewSessionQueue.addToQueue] - 7f03040b-a214-48bd-94b3-47df03cc21c5: Adding session request to queue
08:48:44.243 INFO [LocalNewSessionQueue.injectIntoQueue] - 7f03040b-a214-48bd-94b3-47df03cc21c5: Adding request to queue
08:48:44.243 INFO [LocalNewSessionQueue.addToQueue] - 7f03040b-a214-48bd-94b3-47df03cc21c5: Waiting for session to be created
...
08:49:13.947 INFO [LocalNewSessionQueue.lambda$getNextAvailable$7] - available request: 7f03040b-a214-48bd-94b3-47df03cc21c5
...
08:49:14.244 INFO [LocalNewSessionQueue.addToQueue] - 7f03040b-a214-48bd-94b3-47df03cc21c5: Session request timed out
08:49:14.244 INFO [LocalNewSessionQueue.addToQueue] - 7f03040b-a214-48bd-94b3-47df03cc21c5: session creation error sent to client
...
08:49:15.786 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 934ed7006d17484db7ca3b67695672bf [request 7f03040b-a214-48bd-94b3-47df03cc21c5]
Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 104.0.5112.102, chrome: {chromedriverVersion: 104.0.5112.79 (3cf3e8c8a07d..., userDataDir: C:\Users\ZSSELE~1\AppData\L...}, cs:test: test3-1, goog:chromeOptions: {debuggerAddress: localhost:57386}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: Windows 10, proxy: {}, se:bidiEnabled: false, se:cdp: ws://selenium-server-dev.co..., se:cdpVersion: 104.0.5112.102, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true}
You can look at the modified code (logs added) here: https://github.com/bhecquet/selenium/blob/grid_concurrency/java/src/org/openqa/selenium/grid/sessionqueue/local/LocalNewSessionQueue.java
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
I encounter an issue when trying to execute more tests than a Selenium Grid can handle
My setup:
java -jar selenium-server-4.8.3.jar hub --session-request-timeout 30 --port 4448
java -jar selenium-server-4.8.3.jar node --max-sessions 1 --port 6666 --grid-url http://selenium-server.company.com:4448 --publish-events tcp://selenium-server.company.com:4442 --subscribe-events tcp://selenium-server.company.com:4443
The script adds a wait to force new sessions to be queued So when starting the 4 instances (methods test1, test2, test3 and test4), I can see them run, or queued, but at some point, a session is created but not used: on the node, browser is started, but URL not loaded. In that case, we need to wait 5 mins (default session timeout) for next test to be started
From what I observed, I would say that there is a race condition between the session request timeout and the session creation thread, as the "stalled" session is logged as timed out in test script logs:
Cannot create driver: Could not start a new session. Response code 500. Message: Could not start a new session. New session request timed out
Thanks for your help
How can we reproduce the issue?
Relevant log output
Moreover an other test showed the following events:
new RemoteWebDriver
Operating System
Hub: RHE 7.9, Node: Windows 10
Selenium version
4.7.2 (script), 4.8.3 (Hub & node)
What are the browser(s) and version(s) where you see this issue?
chrome 104
What are the browser driver(s) and version(s) where you see this issue?
chromedriver 104
Are you using Selenium Grid?
4.8.3