In constrast to matrices, replicating rows in data.frames is very slow. The bottleneck is the check/creation of unique rownames. In many situations, one does not care about the latter and it would be convenient to pass a ignore.row.names = TRUE argument to the subsetting operation [.data.frame.
Example:
library(bench)
df = iris[1:4]
M = data.matrix(df)
row_id = rep(1:150, each = 1000)
fast_row_subset_df <- function(x, i) {
out <- lapply(x, function(z) if (length(dim(z)) != 2L) z[i] else z[i, , drop = FALSE])
attr(out, "row.names") <- .set_row_names(length(i))
class(out) <- "data.frame"
out
}
bench::mark(
df[row_id, ],
M[row_id, ],
fast_row_subset_df(df, row_id),
check = "ignore"
)
The API of [ could be:
`[.data.frame` <- function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 1, ignore.row.names = FALSE) {
...
}
In constrast to matrices, replicating rows in data.frames is very slow. The bottleneck is the check/creation of unique rownames. In many situations, one does not care about the latter and it would be convenient to pass a
ignore.row.names = TRUE
argument to the subsetting operation[.data.frame
.Example:
The API of
[
could be: