Closed Earlopain closed 11 months ago
@Earlopain, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
Hi @Earlopain, the same docker compose file that you shared but I could not reproduce. The image tag is used 4.15.0-20231129
Can you check in DevTool if there is any request error?
Hi there,
no error in the console. The websocket is being openened but is not receiving any data. The first two packet seems to be some kind ping/pong type of deal, but again, that's just not happening.
I did just now test on another machine, Windows this time, and I have no trouble getting it to work there. I tested with firefox running on the host, and firefox running through wsl as well because why not. Both worked no problem.
I'm going to set up a fresh linux vm and check how it behaves there.
I have encountered the same problem as you, but I encountered it on K8S. My VNC interface is blank, but my request can run normally. This is the same from the previous version 20231110 to this version 20231129.
I don‘t understand why the container shows vnc port is 7900,but the service open port 6900:5900,could you please explain it?@vietnd96
Hi @vietnd96,
I have installed docker in a fresh linux vm with https://endeavouros.com/ installed. After setting up docker with the following commands and starting the selenium image I observe the same symptoms as in my initial report:
yay -S docker
yay -S docker-compose
sudo systemctl start docker
It may have something to do with arch/endevouros being a rolling release and as such always having the latest versions, or it may be linux specific. I'm not sure with what host OS you were testing with.
For the record, here are the docker/compose versions in use:
$ docker compose version
Docker Compose version 2.23.3
$ docker -v
Docker version 24.0.7, build afdd53b4e3
I don‘t understand why the container shows vnc port is 7900,but the service open port 6900:5900,could you please explain it?@vietnd96
Hi @zhaoyaohui0, as my understanding 5900
is container port for VNC (you can use any tools support VNC protocol to connect e.g VNC Viewer, Remmina, TigerVNC Viewer, etc.), and 7900
is container port for NoVNC (which is used to stream via websocket to live preview on Grid UI)
-p 6900:5900
which means mapping port 5900 from the container to port 6900 on the host. Can we skip this mapping? Of course, YES, it does not impact grid works.-p 6900:5900
? When you want to debug something or watch how the test executing via tools support VNC that I mentioned above. But not always people have tool ready to use, that is reason of NoVNC, you are able to watch live preview of each session on Grid UI quickly-p 6900:5900
, can we map -p 5900:5900
or any host port? Of course, YES. 6900
is used in documentation I guess because of avoid port clashing. As you know, if a host we setup e.g vncserver for remote access, VNC by default uses TCP port 5900+N, where N is the display number (usually :0 for a physical display). If a host we access via VNC, then continue mapping 590x ports for selenium vnc container port, it causes port clash for sure@Earlopain @zhaoyaohui0 where is this failing? Which environments? The report is very ambiguous.
@diemol I have provided additional information in my followup comment, is that not enough? I unfortunatly don't have more than "install this OS, setup up docker there and try again". How would I go about gathering more useful information for you, or what are you looking for?
You also mention Kubernetes at the beginning of the issue. Hence my question.
Also, how popular is that OS? I mean, we try to provide something that works in most OS, but if it fails in a few and the user base is small, we won't troubleshoot that because we are a small team, and we try to focus on the common use cases.
Having said that, do you see the same with Ubuntu? macOS? Windows?
Kubernetes was the other person, I'm just using it through docker. Endevouros is Arch with a GUI installer, it ships exactly the same software + some small GUI applications on top. I used it because it is convenient and easy to set up, contrary to when setting up Arch on your own.
I did test on Windows and had no trouble there. I don't own an Apple device so nothing for me to do there.
I can try out Ubuntu in a bit when I'm at my home PC. I will install latest docker versions, see how that turns out and let you know then.
I gave it a try with Ubuntu 23.10 and it just worked as well.
Ended up installing plain Arch instead of EndeavourOS just to make sure and it doesn't work with that.
Here are some other findings: I enabled stdout logging for the other services and as expected NoVNC is trying to establish a connection. I accidentally left it open while testing and after a whooping 2.5 minutes it actually managed to connect.
selenium-1 | 172.23.0.1 - - [05/Dec/2023 16:32:44] 172.23.0.1: Plain non-SSL (ws://) WebSocket connection
selenium-1 | 172.23.0.1 - - [05/Dec/2023 16:32:44] 172.23.0.1: Path: '/websockify'
selenium-1 | 172.23.0.1 - - [05/Dec/2023 16:32:44] connecting to: localhost:5900
selenium-1 | 05/12/2023 16:35:18 Got connection from client 127.0.0.1
After establishing a connection once, future connections still take the 2.5 minutes to establish.
It doesn't seem to have anything to do with NoVNC. I exposed port 5900, wanting to connect with a local client, and that takes this long as well. I did a few runs, and the duration seems consistent. For 5 runs, it always took 154 seconds.
I don't know what one would do with this information though. This all seems very nonsensical to me especially considering it works with other OSes and its just docker in the end.
I have encountered the same problem as you, but I encountered it on K8S. My VNC interface is blank, but my request can run normally. This is the same from the previous version 20231110 to this version 20231129.
For K8s, the URL to access grid UI that you are using with schema http://
right?
If yes, can you try to use https://
(ignore the insecure warning if any), live preview can access.
I've started reducing the docker image and with a majority of the selenium things removed I still run into this issue.
At this point I'm almost certain it got nothing to do with anything in this repo, so feel free to close this issue, from my side at least. I'll continue to investigate myself and make the report for this at the proper place, if I manage to actually find it.
Thank you for your troubleshooting. I will close this based on your comments but feel free to add your findings in additional comments.
I did some digging and have found the root cause. Inside the docker container ulimit -n
is incredibly high for some reason. ulimit -n
=> 1073741816
This code in libvncserver enumerates them all, taking up huge amounts of CPU time. I didn't notice CPU spinning beforehand. https://github.com/LibVNC/libvncserver/blob/784cccbb724517ee4e36d9938f93b9ee168a29e7/src/libvncserver/sockets.c#L508-L527
The temporary solution is quite simple: set the ulimit for docker manually:
version: "3"
services:
selenium:
image: selenium/standalone-chrome:4.15.0-20231110
environment:
- SE_VNC_NO_PASSWORD=1
shm_size: 2gb
ports:
- ${EXPOSED_VNC_PORT:-7900}:7900
ulimits:
nofile:
soft: 65536
hard: 65536
I don't know why these limits would differ from the host, documentation states they are inherited. My host value is just a measly 524288, but it is what it is.
As for why it worked with focal but not with jammy, perhaps this codepath wasn't hit before. The limit is still high inside docker, what do I know.
Here's some prior art: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920913
Here's a thread on the arch forum where I'm going to probably talk a bit more about this: https://bbs.archlinux.org/viewtopic.php?id=290863
Here's an issue on the docker engine repo which I think is most relevant: https://github.com/moby/moby/issues/44547
And a PR that supposedly fixes this but hasn't been part of a release yet: https://github.com/containerd/containerd/pull/8924
Here's an issue I made in libvncserver talking about the consequences of having an incredibly high RLIMIT_NOFILE
: https://github.com/LibVNC/libvncserver/issues/600
Wow, great troubleshooting! Thanks for sharing.
I verified the fix above.
My case was both the VNC and noVNC lead to very long wait to connect, next to forever. In rarity, it reached password prompt but it still waits afterward and timeout.
Can we put this in README on troubleshoot section?
I'm not so sure on the value of that. This only happens when distros use the prepackaged systemd unit files with very recent docker and systemd versions, which in reality not very many actually do.
Once upstream releases versions that contain a fix this section would pretty much becomes obsolete. You seem to have found this through issues just fine, I think that is good enough.
I saw a few Dockerfiles have a practice that displays a warning if ulimit -n is too high when running Docker. I also tried added one to notice the user https://github.com/SeleniumHQ/docker-selenium/commit/acda753acb9745935531407628eee27a503d98b4 @Earlopain, do you think a workaround as below will work while waiting for upstream fixes that?
[program:vnc]
priority=5
command=ulimit -n 65536 && /opt/bin/start-vnc.sh
The idea is there, yes. However if ulimit is already set to a lower value in the container then trying to set it to something higher will return a non-zero exit code, at least for an unprivileged user. That needs to be accounted for.
In addition, TIL that ulimit is a shell buildin and supervisord seems to only starts actual binaries (so I think &&
would not work either. It needs to be part of the start script.
After doing both of that, it works fine for me. Nice that a workaround is being considered here (:
ulimits: nofile: soft: 65536 hard: 65536
Thank you. This fixed my issue as well.
selenium/standalone-chrome:118.0
worked, butselenium/standalone-chrome:119.0
and 120 needed this fix.
New releases will contain a workaround, a section in the readme for this shouldn't be needed anymore. See #2058
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
When upgrading
standalone-chrome
from4.15.0-20231108
to4.15.0-20231110
, the NoVNC web interface is not able to connect. Issue exists on the latest version4.15.0-20231129
as well, I just tested in which version it started.It's perpetually stuck in the "Connecting..." screen, the websocket being openend is not recieving any data.
The only difference between these two versions is the upgrade from Focal to Jammy in PR #1923
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
Arch Linux
Docker Selenium version (tag or chart version)
4.15.0-20231110