HenrikBengtsson / Wishlist-for-R

Features and tweaks to R that I and others would love to see - feel free to add yours!
https://github.com/HenrikBengtsson/Wishlist-for-R/issues
GNU Lesser General Public License v3.0
134 stars 4 forks source link

WISH: .rm(x) - a fast light-weight version of rm(x) #18

Open HenrikBengtsson opened 8 years ago

HenrikBengtsson commented 8 years ago

Background

rm(x) and rm(list="x") are slow. The latter 2-3 times faster, but still very slow (100-200 times slower) compared to a simple assignment, e.g. x <- NULL. For a few number of calls to rm() this makes little difference, but if it's called thousands of times it is noticable.

Some benchmark results:

> options(digits=3)
> microbenchmark::microbenchmark(
  "rm(x)"            = { x <- 1; rm(x) },
  "rm(list='x')"     = { x <- 1; rm(list="x") },
  ".Internal(rm(x))" = { x <- 1; .Internal(remove("x", parent.frame(), FALSE)) },
  "x <- NULL"        = { x <- 1; x <- NULL },
  times=10e3, unit="ms"
)

Unit: milliseconds
             expr      min       lq     mean   median       uq    max neval
            rm(x) 0.030027 0.033492 0.036719 0.034647 0.036186 3.3753 10000
     rm(list='x') 0.018479 0.021558 0.023979 0.022329 0.023483 1.5960 10000
 .Internal(rm(x)) 0.000385 0.001155 0.001249 0.001156 0.001541 0.0192 10000
        x <- NULL 0.000000 0.000001 0.000174 0.000001 0.000386 0.0273 10000

Troubleshooting

One reason rm() is slow is that already at the R level it carries lots of extra weight in order to work in many different cases, e.g. rm(x), rm(list="x"), rm(x,y), rm(list=c("x", "y"), envir=env, inherits=TRUE) etc. As the benchmark stats show, calling .Internal(remove("x", ...)) is yet faster, but still 10 times slower than a plain assignment.

> base::rm
function (..., list = character(), pos = -1, envir = as.environment(pos),
    inherits = FALSE)
{
    dots <- match.call(expand.dots = FALSE)$...
    if (length(dots) && !all(vapply(dots, function(x) is.symbol(x) ||
        is.character(x), NA, USE.NAMES = FALSE)))
        stop("... must contain names or character strings")
    names <- vapply(dots, as.character, "")
    if (length(names) == 0L)
        names <- character()
    list <- .Primitive("c")(list, names)
    .Internal(remove(list, envir, inherits))
}

Suggestion 1

As a straightforward first improvement, the base package could provide:

.rm <- function(x) .Internal(remove(x, parent.frame(), FALSE))
> options(digits=3)
> microbenchmark::microbenchmark(
  "rm(x)"            = { x <- 1; rm(x) },
  "rm(list='x')"     = { x <- 1; rm(list="x") },
  ".Internal(rm(x))" = { x <- 1; .Internal(remove("x", parent.frame(), FALSE)) },
  ".rm('x')" = { x <- 1; .rm("x") },
  "x <- NULL"        = { x <- 1; x <- NULL },
  times=10e3, unit="ms"
)

Unit: milliseconds
             expr      min       lq     mean   median       uq    max neval
            rm(x) 0.030412 0.033492 0.036597 0.034647 0.036186 1.6772 10000
     rm(list='x') 0.018863 0.021558 0.023578 0.022328 0.023483 1.5206 10000
 .Internal(rm(x)) 0.000385 0.000771 0.001293 0.001156 0.001540 1.4509 10000
         .rm('x') 0.000770 0.001540 0.001976 0.001925 0.002310 1.5279 10000
        x <- NULL 0.000000 0.000001 0.000154 0.000001 0.000386 0.0189 10000

Suggestion 2

The above could probable be improved by a native implementation. In [1], @s-u suggests:

If you really want to go overboard, you can define your own function:

SEXP rm(SEXP x, SEXP rho) { setVar(x, R_UnboundValue, rho); return R_NilValue; }
poof <- function(x) .Call(rm_C, substitute(x), parent.frame())

That will be faster than anything else (mainly because it avoids the trip through strings as it can use the symbol directly).

Miscellaneous

Alternative names for this function:

HenrikBengtsson commented 1 year ago

From the NEWS of R 4.3.0:

The gist of the speedup was to replace:

    dots <- match.call(expand.dots=FALSE)$...
    if(length(dots) && ... {

with

    if(...length()) {
      dots <- match.call(expand.dots=FALSE)$...

Results

This update made rm(list = "x") approximately 8 times faster. In R (>= 4.3.0), rm(list = "x") is now almost as fast as above proposed .rm(x) function.

Benchmarking with:

.rm <- function(x) .Internal(remove(x, parent.frame(), FALSE))

microbenchmark::microbenchmark(
  "rm(x)"            = { x <- 1; rm(x) },
  "rm(list='x')"     = { x <- 1; rm(list="x") },
  ".rm('x')" = { x <- 1; .rm("x") },
  ".Internal(rm(x))" = { x <- 1; .Internal(remove("x", parent.frame(), FALSE)) },
  "x <- NULL"        = { x <- 1; x <- NULL },
  times=10e3, unit="ms"
)

we get, for R 4.3.0:

Unit: milliseconds
             expr      min       lq         mean   median        uq      max
            rm(x) 0.008709 0.009790 0.0118062818 0.010497 0.0114025 1.953368
     rm(list='x') 0.000646 0.000740 0.0009125270 0.000802 0.0009260 0.016631   <==
         .rm('x') 0.000493 0.000575 0.0016054615 0.000622 0.0006960 8.968469
 .Internal(rm(x)) 0.000332 0.000385 0.0004852244 0.000415 0.0004680 0.016766
        x <- NULL 0.000088 0.000130 0.0001465821 0.000140 0.0001520 0.006369

and for R 4.2.3 we get:

Unit: milliseconds
             expr      min       lq         mean   median        uq      max
            rm(x) 0.008701 0.009471 0.0109918104 0.010079 0.0109485 1.912056
     rm(list='x') 0.005817 0.006338 0.0072416564 0.006768 0.0074360 0.708731   <==
         .rm('x') 0.000490 0.000560 0.0015168166 0.000604 0.0006490 8.591883
 .Internal(rm(x)) 0.000326 0.000377 0.0005168445 0.000417 0.0004540 0.605446
        x <- NULL 0.000086 0.000122 0.0001330333 0.000132 0.0001410 0.002661

The details are in https://github.com/wch/r-source/commit/4dc057f5d49d3c0590488100e418e39b68682c95.