HenrikBengtsson / Wishlist-for-R

Features and tweaks to R that I and others would love to see - feel free to add yours!
https://github.com/HenrikBengtsson/Wishlist-for-R/issues
GNU Lesser General Public License v3.0
132 stars 4 forks source link

WISH: R CMD check to assert that DLLs are unregistered when package is unloaded #29

Open HenrikBengtsson opened 8 years ago

HenrikBengtsson commented 8 years ago

Background

Packages with native code loads DLLs when loaded. More precisely, on Windows Dynamic Link Library (DLL) files are loaded and on Unix-like systems shared library (SO) files are loaded.

For example, when a fresh R session is started we have the following DLLs:

$ R --vanilla
> dll0 <- getLoadedDLLs()
> dll0
                                                Filename Dynamic.Lookup
base                                                base          FALSE
methods       /usr/lib/R/library/methods/libs/methods.so          FALSE
utils             /usr/lib/R/library/utils/libs/utils.so          FALSE
grDevices /usr/lib/R/library/grDevices/libs/grDevices.so          FALSE
graphics    /usr/lib/R/library/graphics/libs/graphics.so          FALSE
stats             /usr/lib/R/library/stats/libs/stats.so          FALSE

When loading a package with native code, it will add another entry, e.g.

> library("matrixStats")
> getLoadedDLLs()
                                                                              Filename Dynamic.Lookup
base                                                                              base          FALSE
methods                                     /usr/lib/R/library/methods/libs/methods.so          FALSE
utils                                           /usr/lib/R/library/utils/libs/utils.so          FALSE
grDevices                               /usr/lib/R/library/grDevices/libs/grDevices.so          FALSE
graphics                                  /usr/lib/R/library/graphics/libs/graphics.so          FALSE
stats                                           /usr/lib/R/library/stats/libs/stats.so          FALSE
matrixStats /home/hb/R/x86_64-pc-linux-gnu-library/3.3/matrixStats/libs/matrixStats.so           TRUE

When unloading a package that registers a DLL it (ideally) not only unloads the package but also unregister its DLL, e.g.

> unloadNamespace("matrixStats")
> getLoadedDLLs()
                                                Filename Dynamic.Lookup
base                                                base          FALSE
methods       /usr/lib/R/library/methods/libs/methods.so          FALSE
utils             /usr/lib/R/library/utils/libs/utils.so          FALSE
tools             /usr/lib/R/library/tools/libs/tools.so          FALSE
internet                 /usr/lib/R/modules//internet.so           TRUE
grDevices /usr/lib/R/library/grDevices/libs/grDevices.so          FALSE
graphics    /usr/lib/R/library/graphics/libs/graphics.so          FALSE
stats             /usr/lib/R/library/stats/libs/stats.so          FALSE

A package can unload its registered DLLs using:

.onUnload <- function(libpath) {
    gc()
    library.dynam.unload(utils::packageName(), libpath)
 }

Forcing the garbage collector to run (gc()) will trigger finalizer functions to be called of which some may need the DLL to run.

Issue

It turns out that several packages forget to unregister their DLLs when unloaded. For example,

> library("digest")
> unloadNamespace("digest")
> getLoadedDLLs()
                                                                  Filename Dynamic.Lookup
base                                                                  base          FALSE
methods                         /usr/lib/R/library/methods/libs/methods.so          FALSE
utils                               /usr/lib/R/library/utils/libs/utils.so          FALSE
tools                               /usr/lib/R/library/tools/libs/tools.so          FALSE
internet                                   /usr/lib/R/modules//internet.so           TRUE
grDevices                   /usr/lib/R/library/grDevices/libs/grDevices.so          FALSE
graphics                      /usr/lib/R/library/graphics/libs/graphics.so          FALSE
stats                               /usr/lib/R/library/stats/libs/stats.so          FALSE
digest    /home/hb/R/x86_64-pc-linux-gnu-library/3.3/digest/libs/digest.so           TRUE

(UPDATE: The digest package has since fixed this, but the example still applies to many other packages).

The problem with packages not unregistering their DLLs when unloaded is that it risks to eventually fill up R's internal DLL registry which can only hold MAX_NUM_DLLS (== 100). When this happens, R will fail to load any packages that needs to register a DLL with the following error message:

`maximal number of DLLs reached...

This is guaranteed to happen if one tries to load and unload all CRAN packages one by one, e.g.

for (pkg in CRANpkgs) {
  loadNamespace(pkg)
  unloadNamespace(pkg)
}

There have been several reports on hitting this limit, e.g.

Suggestion / Wish

R CMD check assertion

Have R CMD check also asserts that the package also unloads any registered DLLs, e.g.

* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... WARNING
  Unloading the namespace does not unload DLL
* checking loading without being on the library search path ... OK

unloadNamespace()

Assert / warn

Maybe unloadNamespace() should check for left-over DLLs and give a warning whenever coupled DLLs are not unloaded.

Concerns

Karl Miller wrote on 2016-12-20 (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html): "It's not always clear when it's safe to remove the DLL."

UPDATE 2016-12-20: Add recommendation to run gc() before removing DLL when unloading a package. See thread https://stat.ethz.ch/pipermail/r-devel/2016-December/073522.html.

HenrikBengtsson commented 8 years ago

Related: Next release of R.utils 2.4.0 (on CRAN) provides will provide:

HenrikBengtsson commented 7 years ago

Related

B Ripley just submitted the following "note to self comments" to src/main/Rdynload.c:

 +/* Note that it is likely that dlopen will use up at least one file
 +   descriptor for each DLL loaded (it may load further dynamically
 +   linked libraries), so we do not want to get close to the fd limit
 +   (which may be as low as 256). */
 #define MAX_NUM_DLLS   100

So, apparently, there might be downstream issues if MAX_NUM_DLLS is just increased (as several are requesting), although it doesn't look to serious of a problem.

Existing limits set by the OS

Here dlopen refers to a native function: "function dlopen() loads the dynamic shared object (shared library) file named by the null-terminated string filename and returns an opaque "handle" for the loaded object". The fd limit refers to "the maximum number of open files / file descriptors (FD)". The limit is specific to each system. On Ubuntu 16.04 one can find the limit as:

$ cat /proc/sys/fs/file-max
1613668

and "hard and soft value" for a user (which I don't know what they are):

$ ulimit -Hn
65536

$ ulimit -Sn
1024

(couldn't R query these limits?)

The above are the defaults on my OS setup. Apparently, one can increase this limit, e.g. https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/.

HenrikBengtsson commented 7 years ago

Related

As of 2017-01-26, in R devel (>= 3.4.0), the DLL limit is now effectively increased to *max(0.6fd_limit, 1000)**, cf. https://github.com/wch/r-source/commit/3b49af72fe3b83dda7014e094322f8d8f077ffe9 :)

The NEWS entry for R-devel (to become 3.5.0) is:

  • The maximum number of DLLs that can be loaded into R e.g. via dyn.load() has been increased up to 614 when the OS limit on the number of open files allows.