Open Earlopain opened 10 months ago
Thanks for reporting! Do you have a rough idea regarding a fix?
This really isn't my forte, so unfortunatly no. The debian issue I linked and similar ones as well didn't do anything itself I believe but relied on this value not being set so high that it causes issues. That is what eventually will happen with the docker issue I mentioned as well.
It's entirely fair for you to just close this and leave it as is if the code is needed like that. Just having this issue here searchable would have saved me quite a few frustrating hours so perhaps some future soul can benefit from it being documented here.
I'd suggest: If the RLIMIT_NOFILE
is set to a pathologically high value – which is a sign that something goes wrong in the system – it should be lowered to a sensible value, e.g. 10k or 100k or the like and the program should call setrlimit()
with that sane value immediately.
Hah, I was about to write I didn't know about the original author's intentions, but the commit is in fact by a previous me 37a581089b053035357509bb5afafb8623a905d5 :thinking:
Hm, as far as I understand this is supposed to guard against fd exhaustion.
So, if say 10k (or some other arbitrarily high number) of unused fds were found, the chance of that seems pretty low. Would it be an alternative to bail out early if sufficient free slots were found and skip checking fds further down?
I ran into this bug too! It turns out, if you run a process inside a Docker container, the file descriptor limit is in the millions or billions. That means millions or billions of calls to fcntl
... which takes a long, long time.
This is very similar to a bug in Red Hat's Package Manager (rpm), and their solution was to avoid calling fcntl
and instead count the file inside /proc/self/fd/
(see https://github.com/rpm-software-management/rpm/pull/444). This solution only works on Linux, but it is far better than checking every possible file descriptor.
I proposed my own fix in #626
Describe the bug On a system with and incredibly high
RLIMIT_NOFILE
, establishing a connection can take minutes.To Reproduce
prlimit -n1073741816 -p $(pidof x11vnc)
Expected Behavior The connection speed should be independent from this limit but the following code iterates through each fd instead: https://github.com/LibVNC/libvncserver/blob/784cccbb724517ee4e36d9938f93b9ee168a29e7/src/libvncserver/sockets.c#L508-L527
Through a current bug with docker/containerd systemd integration, the docker context inherits this incredibly high limit. This will eventually be fixed, but I still wanted to open an issue about this here since the limit may be set high intentionally. Here's some prior art about the same issue in
fakeroot
. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920913