Open HenrikBengtsson opened 8 years ago
Related: Next release of R.utils 2.4.0 (on CRAN) provides will provide:
strayDLLs()
: identifies stray DLLsgcDLLs()
: identifies and unloads stray DLLsB Ripley just submitted the following "note to self comments" to src/main/Rdynload.c
:
+/* Note that it is likely that dlopen will use up at least one file
+ descriptor for each DLL loaded (it may load further dynamically
+ linked libraries), so we do not want to get close to the fd limit
+ (which may be as low as 256). */
#define MAX_NUM_DLLS 100
So, apparently, there might be downstream issues if MAX_NUM_DLLS
is just increased (as several are requesting), although it doesn't look to serious of a problem.
Here dlopen refers to a native function: "function dlopen() loads the dynamic shared object (shared library) file named by the null-terminated string filename and returns an opaque "handle" for the loaded object". The fd limit refers to "the maximum number of open files / file descriptors (FD)". The limit is specific to each system. On Ubuntu 16.04 one can find the limit as:
$ cat /proc/sys/fs/file-max
1613668
and "hard and soft value" for a user (which I don't know what they are):
$ ulimit -Hn
65536
$ ulimit -Sn
1024
(couldn't R query these limits?)
The above are the defaults on my OS setup. Apparently, one can increase this limit, e.g. https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/.
As of 2017-01-26, in R devel (>= 3.4.0), the DLL limit is now effectively increased to *max(0.6fd_limit, 1000)**, cf. https://github.com/wch/r-source/commit/3b49af72fe3b83dda7014e094322f8d8f077ffe9 :)
The NEWS entry for R-devel (to become 3.5.0) is:
- The maximum number of DLLs that can be loaded into R e.g. via dyn.load() has been increased up to 614 when the OS limit on the number of open files allows.
Background
Packages with native code loads DLLs when loaded. More precisely, on Windows Dynamic Link Library (DLL) files are loaded and on Unix-like systems shared library (SO) files are loaded.
For example, when a fresh R session is started we have the following DLLs:
When loading a package with native code, it will add another entry, e.g.
When unloading a package that registers a DLL it (ideally) not only unloads the package but also unregister its DLL, e.g.
A package can unload its registered DLLs using:
Forcing the garbage collector to run (
gc()
) will trigger finalizer functions to be called of which some may need the DLL to run.Issue
It turns out that several packages forget to unregister their DLLs when unloaded. For example,
(UPDATE: The digest package has since fixed this, but the example still applies to many other packages).
The problem with packages not unregistering their DLLs when unloaded is that it risks to eventually fill up R's internal DLL registry which can only hold
MAX_NUM_DLLS
(== 100). When this happens, R will fail to load any packages that needs to register a DLL with the following error message:This is guaranteed to happen if one tries to load and unload all CRAN packages one by one, e.g.
There have been several reports on hitting this limit, e.g.
Suggestion / Wish
R CMD check assertion
Have
R CMD check
also asserts that the package also unloads any registered DLLs, e.g.unloadNamespace()
Assert / warn
Maybe
unloadNamespace()
should check for left-over DLLs and give a warning whenever coupled DLLs are not unloaded.Concerns
Karl Miller wrote on 2016-12-20 (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html): "It's not always clear when it's safe to remove the DLL."
UPDATE 2016-12-20: Add recommendation to run
gc()
before removing DLL when unloading a package. See thread https://stat.ethz.ch/pipermail/r-devel/2016-December/073522.html.