SeleniumHQ / docker-selenium

Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Docker, making it easier to perform browser automation
http://www.selenium.dev/docker-selenium/
Other
7.93k stars 2.51k forks source link

Potential Memory Leak from the selenium router! #1201

Closed dev-wei closed 3 years ago

dev-wei commented 3 years ago

🐛 Bug Report

OOM error is seen from router pod consistently when a series of workload kicked off on Grid v4. The same volume workload had been handled well on Grid v3.

After terminating all the active sessions, the MEM usage is kept high on the router pod.

image

To Reproduce

  1. Spawn 3 chrome nodes, each node has about 5 max sessions.
  2. Keep consuming all available session slots until the router stops responding.
  3. Call /status endpoint, and get
  4. {
    "value": {
    "error": "unknown error",
    "message": "unable to create new native thread",
    "stacktrace": "java.lang.OutOfMemoryError: unable to create new native thread\n\tat java.lang.Thread.start0(Native Method)\n\tat java.lang.Thread.start(Thread.java:717)\n\tat java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)\n\tat java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)\n\tat java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)\n\tat org.openqa.selenium.grid.router.GridStatusHandler.lambda$execute$4(GridStatusHandler.java:161)\n\tat java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)\n\tat java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)\n\tat java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)\n\tat java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)\n\tat java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)\n\tat java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)\n\tat java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)\n\tat org.openqa.selenium.grid.router.GridStatusHandler.execute(GridStatusHandler.java:189)\n\tat org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:183)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:67)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:327)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:67)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:90)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:327)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:67)\n\tat org.openqa.selenium.grid.web.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:29)\n\tat org.openqa.selenium.grid.web.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.grid.web.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"
    }
    }

    From the console.

    09:16:27.611 ERROR [HashedWheelTimer.reportTooManyInstances] - You are creating too many HashedWheelTimer instances. HashedWheelTimer is a shared resource that must be reused across the JVM,so that only a few instances are created.
    java.lang.OutOfMemoryError: Java heap space

    you might create ActorSystem instances instead of reusing a single one.

Expected behavior

Should be able to handle 15 concurrent sessions at any time.

Test script reproducing this issue (when applicable)

Please provide a test script to reproduce the issue you are reporting, if the setup is more complex, GitHub repo links are also OK.

Issues without a reproduction script are likely to stall and eventually be closed.

Environment

OS: Linux Docker-Selenium image version: 4.0.0-beta-1-prerelease-20210114 Also provide the docker image id --> Docker version: 4.0.0-beta-1-prerelease-20210114 Docker-Compose version (if applicable): Kubernetes Exact Docker command to start the containers (if using docker-compose, provide the docker-compose file as well):

dev-wei commented 3 years ago

can someone look at this? @diemol

dev-wei commented 3 years ago

@diemol

Thanks for posting!

I hope this issue could be addressed soon. At this point, the grid would stop responding to 10-20 session requests in 15 minutes. Pretty unusable, to be honest. We have to switch back to V3 to mitigate the production need.

Btw, this could be reproduced in all three deployment modes.

Let me know if anything else needed from me.

dylanlive commented 3 years ago

I've been able to replicate this outside of docker. Posted my observations here: https://github.com/SeleniumHQ/selenium/issues/9112#issuecomment-774291278

dylanlive commented 3 years ago

Oh apologies, this could be a separate issue. I'll leave my comment there though incase it relates.

diemol commented 3 years ago

Closing this one in favour of https://github.com/SeleniumHQ/selenium/issues/9112