Incredibly high RLIMIT_NOFILE results in minutes of initial connection delay

LibVNC / libvncserver

LibVNCServer/LibVNCClient are cross-platform C libraries that allow you to easily implement VNC server or client functionality in your program.

GNU General Public License v2.0

1.11k stars 485 forks source link

Incredibly high RLIMIT_NOFILE results in minutes of initial connection delay #600

Open Earlopain opened 10 months ago

Earlopain commented 10 months ago

Describe the bug On a system with and incredibly high RLIMIT_NOFILE, establishing a connection can take minutes.

To Reproduce

Start x11vnc
Increase RLIMIT_NOFILE with something like prlimit -n1073741816 -p $(pidof x11vnc)
Connect to x11vnc with a vnc client
Observe that the connection takes incredibly long to establish.

Expected Behavior The connection speed should be independent from this limit but the following code iterates through each fd instead: https://github.com/LibVNC/libvncserver/blob/784cccbb724517ee4e36d9938f93b9ee168a29e7/src/libvncserver/sockets.c#L508-L527

Through a current bug with docker/containerd systemd integration, the docker context inherits this incredibly high limit. This will eventually be fixed, but I still wanted to open an issue about this here since the limit may be set high intentionally. Here's some prior art about the same issue in fakeroot. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920913

bk138 commented 10 months ago

Thanks for reporting! Do you have a rough idea regarding a fix?

Earlopain commented 10 months ago

This really isn't my forte, so unfortunatly no. The debian issue I linked and similar ones as well didn't do anything itself I believe but relied on this value not being set so high that it causes issues. That is what eventually will happen with the docker issue I mentioned as well.

It's entirely fair for you to just close this and leave it as is if the code is needed like that. Just having this issue here searchable would have saved me quite a few frustrating hours so perhaps some future soul can benefit from it being documented here.

RokerHRO commented 10 months ago

I'd suggest: If the RLIMIT_NOFILE is set to a pathologically high value – which is a sign that something goes wrong in the system – it should be lowered to a sensible value, e.g. 10k or 100k or the like and the program should call setrlimit() with that sane value immediately.

bk138 commented 10 months ago

Hah, I was about to write I didn't know about the original author's intentions, but the commit is in fact by a previous me 37a581089b053035357509bb5afafb8623a905d5 :thinking:

Earlopain commented 10 months ago

Hm, as far as I understand this is supposed to guard against fd exhaustion.

So, if say 10k (or some other arbitrarily high number) of unused fds were found, the chance of that seems pretty low. Would it be an alternative to bail out early if sufficient free slots were found and skip checking fds further down?

ethan-vanderheijden commented 2 months ago

I ran into this bug too! It turns out, if you run a process inside a Docker container, the file descriptor limit is in the millions or billions. That means millions or billions of calls to fcntl... which takes a long, long time.

This is very similar to a bug in Red Hat's Package Manager (rpm), and their solution was to avoid calling fcntl and instead count the file inside /proc/self/fd/ (see https://github.com/rpm-software-management/rpm/pull/444). This solution only works on Linux, but it is far better than checking every possible file descriptor.

I proposed my own fix in #626