Closed KrzysztofC closed 3 months ago
@KrzysztofC, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
I think this is a webdriver related issue. The selenium node does start the driver and the driver starts the browser. When the browser does not start in time the driver does raise this error.
Host info: host: 'selenium-grid-4-test-node-227-dqqgk', ip: '10.254.22.39' 14:03:34.720 DEBUG [JdkHttpClient.execute0] - Ending request (POST) /session in 60,500ms
Looks like you are running Grid on K8s env. So I think you need to check this thing. Does the setup have an ingress proxy in front? If yes, what read-timeout is set there?
My route in OpenShift already has haproxy.router.openshift.io/timeout set to a higher value of 365s according to https://docs.openshift.com/container-platform/4.16/networking/routes/route-configuration.html My tests capture me time stamps before and after WebDriver driver command and calculate duration to create session. I have seen in the past session taking longer than 2 minutes to get created without any errors in the same env, but I can't explain why/how that worked and when was the last time it worked. Things that changed since then were: updating selenium jar to latest, updating Chrome/chromedriver to latest and possibly it was before I moved from JDK11 to JDK17.
Does the issue happen when there are low resources?
Container is dedicated to running selenium node and browser, nothing else. 8GB Ram allocated and 1 CPU allocated. I think it is pure timing thing, due to slowness. It can be replicated by allocating less CPU, below a point where browser takes more than 60 seconds to start and "fixed" by allocating more CPU resources where browser takes less than 60 seconds to start. Resources (especially CPUs) are quite expensive in our environment. I have tried JDK21 but it's the same, I'm currently running with JDK17.
That's a lot of RAM and too little CPU. Browsers have peaks and might go over 1 CPU. You usually need to have a few free CPUs to handle those peaks.
I understand, but this is openshift, each container runs on an App Node with multiple CPUs visible to it, the CPU setting really controls overall speed with some scheduling, not number of CPUs or threads available. We run 100 Grid Nodes 1 browser each, so it's not possible to go from 1 CPU to multiple on entire setup. It would be ideal if the --session-request-timeout timeout would work with anything higher than 60 seconds, because once the browser opens, thousands of tests run without issues and cpu load stays well below 50% most of the time in those containers. We are trying to be as efficient as possible with limited resources we have. I also checked chromedriver but it does not appear to have any params related to timeouts. I was thinking to create an Image with older selenium, JDK11 and older browser/driver, I will see if I can do that. Old selenium + JDK11 first.
I get the same when headless is set to true. Works fine with headless as false.
When Grid starts the command to open the browser, it uses the configured sessionRequestTimeout
. However, the 60 seconds timeout comes directly from ChromeDriver.
Unfortunately, there is no option to configure that timeout. They assume a browser should open in less than 1 minute.
You can file a bug report with the Chrome folks and/or budget your infrastructure for this situation. For example, instead of 100 browsers with 100 CPUs, you should probably run just 90 browsers and keep tuning the value until you find a stable configuration.
Hi, @KrzysztofC. This issue has been determined to require fixes in ChromeDriver.
You can see if the feature is passing in the Web Platform Tests.
If it is something new, please create an issue with the ChromeDriver team. Feel free to comment the issues that you raise back in this issue. Thank you.
@KrzysztofC Can you check if you have xvfb running on the container executing your tests?. I was able to solve my issue after starting xvfb. With previous Selenium container images, I did not have to start xvfb explicitly but looks like with new version it's required.
nohup /opt/bin/start-xvfb.sh &
nohup ./start-selenium-standalone.sh &
This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.
What happened?
My Hub and Nodes are running in separate containers in OpenShift. Sometimes there will be leftover processes or some general environment slowness in the Node container and then the SessionNotCreatedException happens. Setting --session-request-timeout does not help in this particular case. What happens is: Grid Hub and Node are running in separate containers. Single test is started. While in terminal on the Node, I can see chromedriver (or msedgedriver, depending on capabilities) starting and soon after, browser begins to start up. If the browser is not ready within 60 seconds due to any slowness, the error occurs.
regardless of settings, it happens after exactly 60 seconds. 14:02:34.219 DEBUG [JdkHttpClient.execute0] - Executing request: (POST) /session 14:03:34.720 DEBUG [JdkHttpClient.execute0] - Ending request (POST) /session in 60,500ms
How can we reproduce the issue?
I can reproduce this issue every time when I configure Grid Node container with fewer CPU resources to simulate slowness or conserve resources. If Chrome or Edge takes more than 60 seconds to open, then the issue will happen. Note: my route in OpenShift already has haproxy.router.openshift.io/timeout set to a higher value, but I don't think it matters in this particular case.
Relevant log output
Grid Node with FINEST logs:
Operating System
RedHat 8
Selenium version
4.23.0
What are the browser(s) and version(s) where you see this issue?
N/A
What are the browser driver(s) and version(s) where you see this issue?
N/A
Are you using Selenium Grid?
4.23.0