RcppCore / RcppEigen

Rcpp integration for the Eigen templated linear algebra library
Other
110 stars 40 forks source link

Patch integer overflow in 'Rcpp::wrap(<Eigen::Matrix>)' #105

Closed jaganmn closed 2 years ago

jaganmn commented 2 years ago

Fixes integer overflow bug detected here so that Rcpp::wrap(<Eigen::Matrix>) actually supports long vectors.

eddelbuettel commented 2 years ago

Nice. I'll give this as a good look.

eddelbuettel commented 2 years ago

Do you think you can cook up an example or test ? (bad and wrong math deleted)

jaganmn commented 2 years ago

Sorry about the whitespace. Probably an Emacs glitch on my end, since it looked fine in my buffer. Here is a basic test - I only have 16 GB RAM myself...

Rcpp::sourceCpp(code = '
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
Rcpp::IntegerVector vector_wrap(R_xlen_t n) {
    Eigen::VectorXi x(n, 1);
    for (R_xlen_t i = 0; i < n; ++i) {
        x(i) = (int) (i % 10);
    }
    return Rcpp::wrap(x);
}
// [[Rcpp::export]]
Rcpp::IntegerMatrix matrix_wrap(R_xlen_t n) {
    Eigen::MatrixXi x(n, 1);
    for (R_xlen_t i = 0; i < n; ++i) {
        x(i, 0) = (int) (i % 10);
    }
    return Rcpp::wrap(x);
}
')

gc(FALSE)
n <- floor(2^31.5)
x <- vector_wrap(n)
stopifnot(is.vector(x, "integer"),
          length(x) == n,
          identical(x[seq_len(2^10)], rep_len(0:9, 2^10)))
rm(x)
gc(FALSE)
h <- function(cond) grepl("INT_MAX", conditionMessage(cond))
stopifnot(tryCatch({matrix_wrap(n); FALSE}, error = h))
eddelbuettel commented 2 years ago

Hehe, I use Emacs too. See the (old) first line in the file; these days I prefer .editorconf files.

Maybe my math was wrong but I try to convince myself we need more. Oh, but now I know what my error was. Better way:

eddelbuettel commented 2 years ago

Yep, works well here too. Matrix case does not error. Was it supposed to?

jaganmn commented 2 years ago

I expect matrix_wrap(n) to throw the error array dimensions cannot exceed INT_MAX, and stopifnot to not throw an error as a result, due to the error handler. That is what I observe on my machine...

eddelbuettel commented 2 years ago

Well played -- because that error-throwing was not in your code I didn't run it so I didn't see it. Concur on what the test script does.

And maybe today isn't just my day -- but a vector of size N should have the same memory requirement as an N x 1 matrix. What am I missing?

jaganmn commented 2 years ago

Memory isn't really the problem. The dim attribute of an R array is by definition an integer vector, so no element can exceed INT_MAX even if it is possible to allocate a prod(dim)-length vector. You throw an error only to avoid creating an invalid R object.

The constraint on dim is mentioned in ?`Memory-limits`:

The number of bytes in a character string is limited to 2^31 - 1 ~ 2*10^9, which is also the limit on each dimension of an array.

and also discussed in R-ints here.

eddelbuettel commented 2 years ago

You are spot on, and I once knew that by heart too. With R_xlen_t we got larger matrices and vectors (double as index counter) but the dims is the remaining constraint even if total size no longer is. I had plainly forgotten about the dim attribute type constraint, and that also explains why you had to add the check.

That is actually a really nice on. We could add that the unit tests, possibly behind another opt-in boolean toggle (just how Rcpp has by default most of its tests off as they take moment). So if, say, RcppEigen_Large_Vector_Test is TRUE we run them just as you have it here with one expect_equal or expect_true and one expect_error. I may toss that in tomorrow.