Open n1xan opened 2 years ago
Hello, is there any update on this?
Thanks, Nikolay
Can you share the Grid logs please?
Maybe you are hitting this issue: #1605
Hi @diemol did you get a chance to take a look at the logs?
Facing the same issue. Grid is crashing when we are trying to increase the number of parallel.
Please find the logs for failure below.
Hub logs :
grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 16:16:57.145 INFO [ProxyNodeWebsockets.createWsEndPoint] - Establishing connection to ws://localhost:34271/devtools/browser/49bf5ba0-0608-4c03-92a4-e34d577d393f grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 19:06:43.991 INFO [LocalNode.lambda$new$3] - Session id 1ce4ab73af13e94b6501c684ce8458f0 timed out, stopping... grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 19:07:35.544 INFO [SessionSlot.stop] - Stopping session 1ce4ab73af13e94b6501c684ce8458f0 grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | Trapped SIGTERM/SIGINT/x so shutting down supervisord... grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 2023-01-04 19:07:50,560 WARN received SIGTERM indicating exit request grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 2023-01-04 19:07:50,562 INFO waiting for xvfb, vnc, novnc, selenium-node to die grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 2023-01-04 19:07:51,001 INFO stopped: selenium-node (terminated by SIGTERM) grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 2023-01-04 19:07:52,003 INFO stopped: novnc (terminated by SIGTERM) grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 2023-01-04 19:07:53,005 INFO stopped: vnc (terminated by SIGTERM) grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | 2023-01-04 19:07:54,007 INFO stopped: xvfb (terminated by SIGTERM) grid_chrome.1.qo153i2lk1m4@ip-172-21-96-203.ec2.internal | Shutdown complete
Node logs:
rid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | 16:44:20.188 WARN [SeleniumSpanExporter$1.lambda$export$1] - Unable to execute request for an existing session: Unable to find session with ID: 66f1a3290d8df819ce71103aed24fcf2 grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | Build info: version: '4.5.0', revision: 'fe167b119a' grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.10.135-122.509.amzn2.x86_64', java.version: '11.0.16' grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | Driver info: driver.version: unknown grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | 16:44:20.188 WARN [SeleniumSpanExporter$1.lambda$export$1] - org.openqa.selenium.NoSuchSessionException: Unable to find session with ID: 66f1a3290d8df819ce71103aed24fcf2 grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | Build info: version: '4.5.0', revision: 'fe167b119a' grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.10.135-122.509.amzn2.x86_64', java.version: '11.0.16' grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | Driver info: driver.version: unknown grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.grid.sessionmap.local.LocalSessionMap.get(LocalSessionMap.java:129) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.grid.router.HandleSession.lambda$loadSessionId$4(HandleSession.java:159) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at io.opentelemetry.context.Context.lambda$wrap$2(Context.java:224) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:122) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route.execute(Route.java:68) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route.execute(Route.java:68) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.grid.router.Router.execute(Router.java:91) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route.execute(Route.java:68) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route.execute(Route.java:68) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route.execute(Route.java:68) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Route.execute(Route.java:68) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal | at java.base/java.lang.Thread.run(Thread.java:829) grid_hub.1.nguio5qzokcj@ip-172-21-102-91.ec2.internal |
What happened?
We are facing some reliability issues while using Selenium Grid 4.
The grid is working as expected under the usual work load of about 5 to 8 concurrent sessions. The issues start presenting once we trigger tests that require higher concurrent sessions count, even thought the tests are very short and focused.
Then the grid stops processing the incoming the requests, we are seeing a infinite loading screen on the
/ui#/sessions
endpoint, as all/graphql
requests are hanging.At that point of time, the heaclthcheck URL, returns that the status is OK, and all nodes are up and running. Only difference is that there are no sessions listed:
I cannot share the exact code that reproduces this behaviur, but I have prepared a demo solution that simulates the same issue on our infrastructure. I will attach it and you can try running it in your controlled environment. The current setup we use is max 20 concurrent sessions on 16-core CPUs, 64 GB RAM VM: SeleniumGridIssue_NikolayAvramov.zip You can control the level of parallelism using the
CheckoutPage10Tests.cs
class Currently it is set to 10, but I managed to reproduce it using value closer to 20 in our case.Command used to start Selenium Grid with Docker
Relevant log output
Operating System
Linux
Docker Selenium version (tag)
4.2.1-20220608