Closed kjjnygres closed 1 month ago
@kjjnygres, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
I think you can retest after this PR will be released in 4.23 https://github.com/SeleniumHQ/selenium/pull/14272
Has the issue been identified? What is the issue here?
Are using Grid autoscaling in K8s? If yes, it could relate to another issue is described in https://github.com/SeleniumHQ/selenium/pull/14282
No. I am spinning and destroying containers on demand.
Yes, so it could be related. When destroying the containers, I think the drain node API endpoint is used to detach the Node from the Hub. The node will have a chance to switch status from UP to DRAINING (for some time until the container stops completely).
Oh. Waiting for the new release then. Thanks for explaining
Waiting for this release as well as I encountered issues when running parallel tests using Kubernetes with auto KEDA autoscaling. Hope this will fix my issue. Thanks
I have an issue also when running parallel tests wherein the ongoing tests (session or the session id) will be hijack by the incoming tests and that session will be closed even if there is still an ongoing test from owner of that session.
Unable to find session with ID: b77e7338533a3ccbd9c70eee277c156d
I also encountered this issue, I currently bypass this by scheduling my tests to not run in parallel, Waiting for the release as well. Thank you.
Images tag 4.23.0-20240727
and Helm chart selenium-grid-0.33.0
contain 2 needed fixes mentioned above. Can you please verify and confirm.
Issue still persists for me. Node is created, but is waiting in the queue. Session never starts.
Message: Could not start a new session. New session request timed out
Host info: host: '801aca98ceec', ip: '172.19.0.2'
Build info: version: '4.23.0', revision: '77010cd'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.0-427.22.1.el9_4.x86_64', java.version: '17.0.11'
Driver info: driver.version: unknown
Stacktrace:
at org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueue.addToQueue (LocalNewSessionQueue.java:221)
at org.openqa.selenium.grid.sessionqueue.NewSessionQueue.lambda$new$0 (NewSessionQueue.java:68)
at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle (Route.java:192)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle (Route.java:360)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.grid.sessionqueue.NewSessionQueue.execute (NewSessionQueue.java:128)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute (SpanWrappedHttpHandler.java:87)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle (Route.java:360)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.grid.router.Router.execute (Router.java:87)
at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0 (EnsureSpecCompliantResponseHeaders.java:34)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle (Route.java:360)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0 (AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0 (ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0 (ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0 (SeleniumHandler.java:44)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:539)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:635)
at java.lang.Thread.run (Thread.java:840)
@kjjnygres, from client binding, can you provide all capabilities that you provided? I suspect that request session not match with Node stereotypes, so there is no session could start
Ok, I saw you are mentioning the scenario is setting custom capabilities for matching specific Nodes.
For this, while doing the PR #2323 - adding caps se:containerName
default in Node container stereotypes. I also saw something incorrectly reflected in node Chromium (abstract of Chrome and Edge) - se:containerName
present in Node stereotypes however it didn't present in session caps (for example below screenshot). So, probably when session request set custom caps, it could not assign to any Node.
Can you confirm that, in your setup, if request Firefox node, it is stilling working fine?
Node stereotypes
Session capabilities (Chrome)
Session capabilities (Firefox)
I think the issue still is what you mentioned before: after destroying the container and spinning it up with the same name, there might be some issue with its status. which is causing this issue.
Stacktrace in last comment was my mistake. I was sending in wrong capabilities. Please find updated stacktrace below:
Message: Could not start a new session. New session request timed out
Host info: host: '801aca98ceec', ip: '172.19.0.2'
Build info: version: '4.23.0', revision: '77010cd'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.0-427.22.1.el9_4.x86_64', java.version: '17.0.11'
Driver info: driver.version: unknown
Stacktrace:
at org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueue.addToQueue (LocalNewSessionQueue.java:221)
at org.openqa.selenium.grid.sessionqueue.NewSessionQueue.lambda$new$0 (NewSessionQueue.java:68)
at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle (Route.java:192)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle (Route.java:360)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.grid.sessionqueue.NewSessionQueue.execute (NewSessionQueue.java:128)
at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute (SpanWrappedHttpHandler.java:87)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle (Route.java:360)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.grid.router.Router.execute (Router.java:87)
at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0 (EnsureSpecCompliantResponseHeaders.java:34)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.remote.http.Route$CombinedRoute.handle (Route.java:360)
at org.openqa.selenium.remote.http.Route.execute (Route.java:69)
at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0 (AddWebDriverSpecHeaders.java:35)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0 (ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0 (ErrorFilter.java:44)
at org.openqa.selenium.remote.http.Filter$1.execute (Filter.java:63)
at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0 (SeleniumHandler.java:44)
at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:539)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:635)
at java.lang.Thread.run (Thread.java:840)
I am using capabilities like this and it is working fine for me.
I am using this capability from my code:
SE_NODE_STEREOTYPE='{"browserName":"chrome","browserVersion":"122.0","goog:chromeOptions":{"binary":"/usr/bin/google-chrome"},"platformName":"linux","se:noVncPort":7900,"se:vncEnabled":true, "nodename:applicationName":"AccountCreation1"}'
Ok, Node status switches to DRAINING from UP only API endpoint drain node is called - https://www.selenium.dev/documentation/grid/advanced_features/endpoints/#drain.
This is something to do in container entry point, since via docker stop
, it could go to a situation where Node status is still UP, and suddenly it down by docker stop, the DOWN status not come to Hub immediately (since health checks trigger by interval)
So may be I have to wait between docker stop and docker run for a same container?
Let me try to add a mechanism graceful shutdown the Node when deploying it via docker, or docker-compose
@kjjnygres Btw, do you have any script publicly that I can use to test your scenario with the implementation?
I'm afraid no. But what I do is before running a container, I check if container with the same name is already up or not. If so, I destroy that container and then run with the same name. There is not much time between these two activities. I think you can reproduce by creating a small script wherein you first run a container, destroy it, and run again.
Here is how I do it in my code:
sh script: "docker stop ${job} || true && docker rm ${job} || true"
sh script: "docker run -d --net grid --name ${job} -e SE_EVENT_BUS_HOST=selenium-hub -e SE_NODE_STEREOTYPE='{\"browserName\":\"chrome\",\"browserVersion\":\"122.0\",\"goog:chromeOptions\":{\"binary\":\"/usr/bin/google-chrome\"},\"platformName\":\"linux\",\"se:noVncPort\":7900,\"se:vncEnabled\":true, \"nodename:applicationName\":\"${job}\"}' --cap-add=CAP_AUDIT_WRITE --shm-size=\"2g\" -e SE_EVENT_BUS_PUBLISH_PORT=4442 -e SE_EVENT_BUS_SUBSCRIBE_PORT=4443 localhost/node-image-xx"
Just an update: it is now happening very rarely. I almost forgot about this issue :)
I also encountered the same issue using standalone docker configuration. The tests will trigger in parallel and creates browser sessions. However, the test execution for each session is overriding.
`version: "3" services: selenium-event-bus: image: selenium/event-bus:4.23.0-20240727 container_name: selenium-event-bus ports:
grid restart: unless-stopped
selenium-sessions: image: selenium/sessions:4.23.0-20240727 container_name: selenium-sessions ports:
grid restart: unless-stopped
selenium-session-queue: image: selenium/session-queue:4.23.0-20240727 container_name: selenium-session-queue ports:
grid restart: unless-stopped
selenium-distributor: image: selenium/distributor:4.23.0-20240727 container_name: selenium-distributor ports:
grid restart: unless-stopped
selenium-router: image: selenium/router:4.23.0-20240727 container_name: selenium-router ports:
grid restart: unless-stopped
chrome: image: selenium/node-chrome:4.23.0-20240727 shm_size: 2gb depends_on:
environment:
networks: grid: driver: bridge`
Btw, I'm using simple robotframework script using browser library.
` Settings Library Browser
Test Cases Debug 001 New Browser browser=chromium headless=False args=["--start-maximized"] New Context ignoreHTTPSErrors=True viewport=${None} New Page https://robocon.io Sleep 10s`
command to execute in terminal
SELENIUM_REMOTE_URL=http://localhost:4444 robot -v TS.robot
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
Context:
I am trying to run parallel tests with the help of selenium grid. My grid and nodes all are using same machine.
Currently what is being done is when I run a test suite, a docker container gets spinned up for that suite, execute the test, and then destroys itself when the suite execution is complete. Every docker container here is a node, connected with the hub.
Which container/node should run which test suite is managed by desired capabilities:
SE_NODE_STEREOTYPE='{"browserName":"chrome","browserVersion":\"122.0\","goog:chromeOptions":{"binary":"/usr/bin/google-chrome"},"platformName":"linux","se:noVncPort":7900,"se:vncEnabled":true, "nodename:applicationName":"Container1"}
Problem:
Sometimes the suites execute just fine; sometimes one of the suites get failed; sometimes none of the test starts; giving below error:
"Driver info: driver.version: unknown
Stacktrace:
System info:
os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.0-427.22.1.el9_4.x86_64', java.version: '17.0.11'
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
Linux
Docker Selenium version (image tag)
4.22.0
Selenium Grid chart version (chart version)
No response