brodieG / website

0 stars 1 forks source link

Rprof limitations / issues #6

Open brodieG opened 5 years ago

brodieG commented 5 years ago

What gets reported is not 100% clear.

The claim is that only specials are not reported. Here is a counter example:

> grp <- sample(1e6, 1e7, rep=T)
> f <- function(x) as.factor(x)
> treeprof::treeprof(f(grp))
Profiling
auto gc: running with gc() first
First run in 1.776 seconds
Looping to 5 seconds
Parsing Rprof
Done
Ticks: 1518; Iterations: 3; Time Per: 1.605 seconds; Time Total: 4.816 seconds; Time Ticks: 1.518

                              milliseconds
f --------------------------- : 1605 -   0
    as.factor --------------- : 1605 - 752
        sort ---------------- :  850 -   0
            unique.default -- :  802 - 802
            sort.default ---- :   49 -   0
                sort.int ---- :   49 -   3
                    order --- :   45 -  45

What's happening is that match is not being reported:

> as.factor
function (x) 
{
    if (is.factor(x)) 
        x
    else if (!is.object(x) && is.integer(x)) {
        levels <- sort(unique.default(x))
        f <- match(x, levels)
        levels(f) <- as.character(levels)
        if (!is.null(nx <- names(x))) 
            names(f) <- nx
        class(f) <- "factor"
        f
    }
    else factor(x)
}
<bytecode: 0x7ffbe2a501f8>
<environment: namespace:base>
> match
function (x, table, nomatch = NA_integer_, incomparables = NULL) 
.Internal(match(x, table, nomatch, incomparables))
<bytecode: 0x7ffbebcad908>
<environment: namespace:base>

This is almost certainly because as.factor is bytecoded, and that match is reduced to the .Internal call, at least some testing with an unbytecoded version of as.factor suggested so.

Related, some clarifications are probably required in terms of what is considered and .Internal function, and a special .Internal. In particular, generically closures that use .Internal as their primary function are considered .Internal, but maybe some function like as.factor don't meet that threshold. The ambiguity is whether the function is the closure, or the name of the C routine that .Internal calls, and these are often the same name.

.Internal itself is a special primitive, which is why I think in the bytecode version of as.factor match dissappears, as it is treated effectively as a call to the .Internal.