SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.48k stars 8.15k forks source link

ERROR [HashedWheelTimer.reportTooManyInstances] - You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created. #9112

Closed mcopjan closed 3 years ago

mcopjan commented 3 years ago

🐛 Bug Report

4.0.0-beta-1-prerelease-20210114 I have a grid4 with 20 chrome nodes (SE_NODE_MAX_CONCURRENT_SESSIONS=5), when running 50 (max) headless tests in parallel I can see hub dying with the error above. When I stopped execution I could see that the hub recovered after some time.

image

selenium_hub_logs.txt.zip

version: '3.7'

services:

  hub:
    image: selenium/hub:4.0.0-beta-1-prerelease-20210114
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"
    deploy:
      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure

  chrome:
    image: selenium/node-chrome:4.0.0-beta-1-prerelease-20210114
    volumes:
      - /dev/shm:/dev/shm
    depends_on:
      - hub
    environment:
      - SE_EVENT_BUS_HOST=hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - LANG=en_GB.UTF-8
      - SE_NODE_MAX_CONCURRENT_SESSIONS=5
      - START_XVFB=false
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1080
      - SCREEN_DEPTH=24
    entrypoint: bash -c 'SE_OPTS="--host $$HOSTNAME" /opt/bin/entry_point.sh'
    deploy:
      replicas: 20
      placement:
        constraints:
          - node.role == worker
      restart_policy:
        condition: on-failure
mcopjan commented 3 years ago

I would like to point out that when are using selenium 3 containers running 80-100 tests in parallel on 20 chrome nodes with maxSessions=5 (hence 100 browser instances), there are no problems.

mcopjan commented 3 years ago

One more thing I noticed, when using the compose file above I would expect 100 instances being registered with the hub (20 chrome replicas, 5 chrome instances each) but graphql shows me totalSlots:80 yet all chrome containers seem to be healthy.

image

mcopjan commented 3 years ago

https://github.com/SeleniumHQ/docker-selenium/issues/1183#issuecomment-769925966

dylanlive commented 3 years ago

Replicating on selenium-server-4.0.0-prerelease-beta-1-02d5e641d5.jar as a Hub. Didn't even run any sessions. Just had 1 node registered.

00:58:19.779 INFO [LogManager$RootLogger.log] - Using the system default encoding
00:58:19.786 INFO [LoggingOptions.createTracer] - Using OpenTelemetry for tracing
00:58:20.030 INFO [BoundZmqEventBus.<init>] - XPUB binding to [binding to tcp://*:4442, advertising as tcp://172.17.0.3:4442], XSUB binding to [binding to tcp://*:4443, advertising as tcp://172.17.0.3:4443]
00:58:20.212 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://172.17.0.3:4442 and tcp://172.17.0.3:4443
00:58:20.284 INFO [UnboundZmqEventBus.<init>] - Sockets created
00:58:20.290 INFO [UnboundZmqEventBus.lambda$new$7] - Bus started
00:58:20.490 INFO [BoundZmqEventBus.<init>] - Event bus ready
00:58:22.430 INFO [Hub.execute] - Started Selenium hub 4.0.0-beta-1 (revision 02d5e641d5): http://172.17.0.3:4444
01:00:03.699 INFO [Node.<init>] - Binding additional locator mechanisms: name, id
01:00:04.416 INFO [LocalDistributor.add] - Added node 9dbf56e3-2d42-4f41-9277-1653beac329e at http://1.2.3.4:5555.
01:00:04.735 INFO [GridModel.setAvailability] - Switching node 9dbf56e3-2d42-4f41-9277-1653beac329e (uri: http://1.2.3.4:5555) from DOWN to UP
01:02:41.748 ERROR [HashedWheelTimer.reportTooManyInstances] - You are creating too many HashedWheelTimer instances. HashedWheelTimer is a sharedresource that must be reused across the JVM,so that only a few instances are created.
01:05:14.752 INFO [GridModel.setAvailability] - Switching node 9dbf56e3-2d42-4f41-9277-1653beac329e (uri: http://1.2.3.4:5555) from UP to DOWN
Exception in thread "AsyncHttpClient-58-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncHttpClient-592-1" java.lang.OutOfMemoryError: Java heap space
03:35:57.062 WARN [HashedWheelTimer$HashedWheelTimeout.expire] - An exception was thrown by TimerTask.
java.lang.OutOfMemoryError: Java heap space
mcopjan commented 3 years ago

Also reported here https://github.com/SeleniumHQ/docker-selenium/issues/1201

diemol commented 3 years ago

@dylanlive what hardware/OS combination were you using to get that?

dylanlive commented 3 years ago

@diemol Within docker (not docker-selenium), running Ubuntu Bionic & openjdk-8. I didn't think it was dependent to Docker, but I surprisingly couldn't repo this with just the Jar running on Amazon Linux 2 & Corretto.

I'm going to experiment more with this in the coming 1-2 weeks, starting with pulling the latest pre-release. I will share updates. If this looks dependent to Docker, perhaps we should move to SeleniumHQ/docker-selenium#1201

dylanlive commented 3 years ago

Happy to share I've been able to come up with a solid replication on pre-release aec4c8c2c5, and this is replicable outside of Docker.

The issue is occurring for me when: 1) I have at least 1 node connected to the hub 2) I make frequent http calls to /status

The reason it took me awhile to figure this out is because the instance had an ELB attached to it, which was calling /status as its health check. @mcopjan I'm curious if you also have health checks occurring on /status

To replicate:

1) Run Hub: java -jar selenium-server-4.0.0-prerelease-beta-1-aec4c8c2c5.jar hub --log-level info 2) Connect a node to the hub 3) Run this bash script (replacing the IP with your hub's ip) to make repeated http calls

#!/bin/bash

for i in {1..200}
do
   curl "REPLACE_IP:4444/status"
   date +"%T"
   sleep 1
done
mcopjan commented 3 years ago

Happy to share I've been able to come up with a solid replication on pre-release aec4c8c2c5, and this is replicable outside of Docker.

The issue is occurring for me when:

  1. I have at least 1 node connected to the hub
  2. I make frequent http calls to /status

The reason it took me awhile to figure this out is because the instance had an ELB attached to it, which was calling /status as its health check. @mcopjan I'm curious if you also have health checks occurring on /status

To replicate:

  1. Run Hub: java -jar selenium-server-4.0.0-prerelease-beta-1-aec4c8c2c5.jar hub --log-level info
  2. Connect a node to the hub
  3. Run this bash script (replacing the IP with your hub's ip) to make repeated http calls
#!/bin/bash

for i in {1..200}
do
   curl "REPLACE_IP:4444/status"
   date +"%T"
   sleep 1
done

Hi @dylanlive, thanks for your update. In my case, I don't have any health check in place (at least I am not aware of it ) but there are a few things I am doing out of standard. I have created a prometheus docker exporter container for selenium 4 which is hitting grid/graphql endpoint every 5 sec to scrape metrics about number of sessions, node slots etc. So very similar to your situation, I am periodically asking graphql endpoint for data, that together with running tests may be enough to shut the grid down with above error...

diemol commented 3 years ago

Thank you, @dylanlive and @mcopjan for the details. This commit fixed the HashedWheelTimer.reportTooManyInstances error, but we are still investigating the CPU and memory usage, which seems linked to the combination of session queue, GraphQL and/or /status endpoint.

I am closing other related issues and we will keep this open to follow further developments here.

mcopjan commented 3 years ago

Out of memory exception encountered using the latest beta prerelease selenium/hub:4.0.0-beta-1-prerelease-20210207 It may be related to this issue

Execution 'bcd02c54-ab4b-4773-8fb4-05c18f0b201e' threw exception when executing : query : '{ grid {totalSlots, maxSession, sessionCount, sessionQueueSize} }'. variables '{}' java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: Java heap space

OOM.zip

dylanlive commented 3 years ago

Thanks! Running the pre-release ec807f83a7. Have 1 node connected, and just have the healthchecker pinging /status

HashedWheelTimer issue looks fixed, but there's some other memory leak it appears. Replicable from my same steps above.

Correction: It takes a lot more requests to replicate, as seen in the time stamps below. Take my above script and change the loop to a higher number Update: This doesn't appear to be as easily replicable. A new attempt today isn't replicating.

19:53:32.463 INFO [LogManager$RootLogger.log] - Using the system default encoding
19:53:32.469 INFO [LoggingOptions.createTracer] - Using OpenTelemetry for tracing
19:53:32.611 INFO [BoundZmqEventBus.<init>] - XPUB binding to [binding to tcp://*:4442, advertising as tcp://172.17.0.2:4442], XSUB binding to [binding to tcp://*:4443, advertising as tcp://172.17.0.2:4443]
19:53:32.701 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://172.17.0.2:4442 and tcp://172.17.0.2:4443
19:53:32.747 INFO [UnboundZmqEventBus.<init>] - Sockets created
19:53:32.762 INFO [UnboundZmqEventBus.lambda$new$7] - Bus started
19:53:32.962 INFO [BoundZmqEventBus.<init>] - Event bus ready
19:53:33.915 INFO [Hub.execute] - Started Selenium Hub 4.0.0-beta-1 (revision ec807f83a7): http://172.17.0.2:4444
19:54:27.896 INFO [Node.<init>] - Binding additional locator mechanisms: id, name
19:54:28.359 INFO [LocalDistributor.add] - Added node 128bc717-986a-4175-aba2-ac1429919f9b at http://1.2.3.4:5555.
19:54:28.453 INFO [GridModel.setAvailability] - Switching node 128bc717-986a-4175-aba2-ac1429919f9b (uri: http://1.2.3.4:5555) from DOWN to UP
20:26:51.327 WARN [DefaultChannelPipeline.onUnhandledInboundException] - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handlerin the pipeline did not handle the exception.
java.lang.OutOfMemoryError: Java heap space
Exception in thread "iothread-2" java.lang.OutOfMemoryError: Java heap space
20:27:41.151 WARN [NioServerSocketChannel.doReadMessages] - Failed to create a new channel from an accepted socket.
java.lang.OutOfMemoryError: Java heap space
20:27:52.692 WARN [DefaultPromise.notifyListener0] - An exception was thrown by io.netty.bootstrap.ServerBootstrap$ServerBootstrapAcceptor$2.operationComplete()
java.lang.OutOfMemoryError: Java heap space
20:28:11.205 WARN [JdkLogger.log] - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.OutOfMemoryError: Java heap space
20:28:16.322 WARN [JdkLogger.log] - Failed to create a new channel from an accepted socket.
java.lang.OutOfMemoryError: Java heap space
20:28:30.279 WARN [DefaultChannelPipeline.onUnhandledInboundException] - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handlerin the pipeline did not handle the exception.
java.lang.OutOfMemoryError: Java heap space
barancev commented 3 years ago

The last comment seems to be not related to this isue, @dylanlive can you please raise a new one to make sure it's not lost here in comments.

mcopjan commented 3 years ago

update on testing the latest 4.0.0-beta-1-prerelease-20210207 I do not see "HashedWheelTimer instances. HashedWheelTimer" message but grid still dies on OOM exception The scenario is the following:

image

 12:26:48.381 WARN [GridModel.setSession] - Grid model and reality have diverged. Slot is not reserved. SlotId{nodeId=3b8470f4-06a1-4dd3-bcba-b0a6aaa526dd, id=1451d424-5c0b-45d3-9fdc-72cda1382ae4}
 Exception in thread "AsyncHttpClient-1177-2" java.lang.OutOfMemoryError: Java heap space
 Exception in thread "AsyncHttpClient-1177-3" java.lang.OutOfMemoryError: Java heap space
 Exception in thread "iothread-2" 12:27:28.042 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 Exception in thread "AsyncHttpClient-1177-4" java.lang.OutOfMemoryError: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:28:29.055 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:28:37.984 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "554f4cf6aa6472ebcad54bfa6a5b7ace","spanId": "71d11b36fc83fa2a","spanKind": "INTERNAL","eventTime": 1612873647901449761,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Java heap space","exception.stacktrace": "java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.OutOfMemoryError","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:30:11.850 WARN [DefaultChannelPipeline.onUnhandledInboundException] - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
 java.lang.OutOfMemoryError: Java heap space
 12:30:38.601 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Self-suppression not permitted
 java.lang.IllegalArgumentException: Self-suppression not permitted
  at java.lang.Throwable.addSuppressed(Throwable.java:1072)
  at org.openqa.selenium.grid.sessionqueue.NewSessionQueuer.validateSessionRequest(NewSessionQueuer.java:90)
  at org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer.addToQueue(LocalNewSessionQueuer.java:74)
  at org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)
  at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
  at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
  at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
  at org.openqa.selenium.grid.sessionqueue.NewSessionQueuer.execute(NewSessionQueuer.java:138)
  at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
  at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
  at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
  at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
  at org.openqa.selenium.grid.router.Router.execute(Router.java:90)
  at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
  at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
  at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
  at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
  at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
  at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
  at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
  at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
  at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
  at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 12:31:01.508 WARN [NioServerSocketChannel.doReadMessages] - Failed to create a new channel from an accepted socket.
 java.lang.OutOfMemoryError: Java heap space
 12:31:03.012 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:31:22.054 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:31:33.316 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:31:52.205 WARN [AbstractChannelHandlerContext.invokeExceptionCaught] - An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
 java.lang.OutOfMemoryError: Java heap space
 12:31:51.816 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "cb5dbec951490cd280cd2ed7e1ed737e","spanId": "390037ccaa7f9a2f","spanKind": "INTERNAL","eventTime": 1612873838601085445,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Self-suppression not permitted","exception.stacktrace": "java.lang.IllegalArgumentException: Self-suppression not permitted\n\tat java.lang.Throwable.addSuppressed(Throwable.java:1072)\n\tat org.openqa.selenium.grid.sessionqueue.NewSessionQueuer.validateSessionRequest(NewSessionQueuer.java:90)\n\tat org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer.addToQueue(LocalNewSessionQueuer.java:74)\n\tat org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.sessionqueue.NewSessionQueuer.execute(NewSessionQueuer.java:138)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:90)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.IllegalArgumentException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:31:33.063 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:32:26.940 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "c3adf52df47f21c0cee0f2eabb920da3","spanId": "0ec70dc099d9c025","spanKind": "INTERNAL","eventTime": 1612873893315896671,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Java heap space","exception.stacktrace": "java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.OutOfMemoryError","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:32:26.940 INFO [Distributor.newSession] - Session created by the distributor. Id: 27dc69095658cb9a780a3b59a1cc6b3c, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 88.0.4324.150, chrome: {chromedriverVersion: 88.0.4324.96 (68dba2d8a0b14..., userDataDir: /tmp/.com.google.Chrome.bmTWPv}, goog:chromeOptions: {debuggerAddress: localhost:40463}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: {}, se:options: {cdp: http://0165cdf3d57b:5555/se...}, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:largeBlob: true, webauthn:virtualAuthenticators: true}
 12:32:32.415 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:32:29.387 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "9788be495a22202ad8ee54d5780f564f","spanId": "ff8456c3cff1bf8d","spanKind": "INTERNAL","eventTime": 1612873863011657485,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Java heap space","exception.stacktrace": "java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.OutOfMemoryError","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:32:28.824 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "96bcedba20507da9bb3528bbdde6ab2c","spanId": "6e3727fc86ec1c95","spanKind": "INTERNAL","eventTime": 1612873882053255385,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Java heap space","exception.stacktrace": "java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.OutOfMemoryError","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:32:27.103 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "a292dc5a18343e57d6b4eb3005182492","spanId": "4bd304e39f096792","spanKind": "INTERNAL","eventTime": 1612873891244016397,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Java heap space","exception.stacktrace": "java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.OutOfMemoryError","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:33:29.244 WARN [LoggingOptions$1.lambda$export$0] - {"traceId": "fba4893e2eed283ae88a5b5a4edbd20a","spanId": "27258678f462e52b","spanKind": "INTERNAL","eventTime": 1612873952235174602,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: Java heap space","exception.stacktrace": "java.lang.OutOfMemoryError: Java heap space\n","exception.type": "java.lang.OutOfMemoryError","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueuer","http.host": "uiplspidoc002.ipswich.mgsops.net:4444","http.method": "POST","http.request_content_length": "253","http.scheme": "HTTP","http.target": "\u002fsession","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

 12:33:02.035 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:32:47.266 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:32:45.216 WARN [SpanWrappedHttpHandler.execute] - Unable to execute request: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 12:33:39.942 INFO [GridModel.setAvailability] - Switching node 2209fad1-e432-4e25-be15-9ebf959e15f5 (uri: http://8942b96d14dd:5555) from UP to DOWN
dylanlive commented 3 years ago

The last comment seems to be not related to this isue, @dylanlive can you please raise a new one to make sure it's not lost here in comments.

I can open a new one @barancev, though @diemol had said:

but we are still investigating the CPU and memory usage, which seems linked to the combination of session queue, GraphQL and/or /status endpoint. I am closing other related issues and we will keep this open to follow further developments here.

@diemol would you also suggest I raise a new, more generic issue for Memory Leaks and the java.lang.OutOfMemoryError: Java heap space errors? Or would you like to keep tracking it here?

dylanlive commented 3 years ago

No longer replicating with light usage. I haven't looked at memory consumption yet, but I'm able to keep the Grid online for at least a day. Thanks for the fixes! looking forward to continued optimizations.

diemol commented 3 years ago

Since the original problem from this issue is not happening anymore, I will close this for now. For the memory issue, we have #9152. @dylanlive, if you are able to reproduce the memory error, please share how you did it on the other issue.