RcppCore / Rcpp

Seamless R and C++ Integration
https://www.rcpp.org
GNU General Public License v2.0
742 stars 211 forks source link

Bugs with Rcpp==1.0.13 #1334

Closed Zilong-Li closed 1 month ago

Zilong-Li commented 1 month ago

Problem: update matrix passed by reference from R

Code:

 Rcpp::cppFunction("
 void test_arma(arma::mat& A) {
 A.fill(1);
 return;
 }", depends = "RcppArmadillo")

 Rcpp::cppFunction("
 void test_eigen(Eigen::Map<Eigen::MatrixXd> A) {
 A.fill(1);
 return;
 }", depends = "RcppEigen")

 Rcpp::cppFunction("
 void test_rcpp(Rcpp::NumericMatrix A) {
 A.fill(1);
 return;
 }")

 f <- function(M = 1e6, N = 1e4, init = "array", test = "arma") {
   if (init == "array") {
     A <- array(0, c(M, N))
   } else {
     A <- matrix(0, nrow = M, ncol = N)
   }
   label <- paste0("start, ", init, ", ", test)
   before <- paste0("A[1, 1] = ", A[1, 1], ", A[M, N] = ", A[M, N])
   x <- A[M, N]
   if (test == "arma") {
     test_arma(A)
   } else if (test == "rcpp") {
     test_rcpp(A)
   } else if (test == "eigen") {
     test_eigen(A)
   }
   after <- paste0("A[1, 1] = ", A[1, 1], ", A[M, N] = ", A[M, N])
   y <- A[M, N]
   worked <- c("worked", "not worked")[1 + as.integer(x == y)]
   message(paste0(label, " before ", before, " after ", after, " ", worked))
 }

 for (init in c("array", "matrix")) {
   for (test in c("arma", "rcpp", "eigen")) {
     system.time(f(M = 1e6, N = 1e4, init = init, test = test))
     gc()
   }
 }

Results and Sessioninfo:

image

Reproducible conda env:

mamba create -n r44 r-base=4.4.0 r-rcpp=1.0.13 r-rcpparmadillo r-rcppeigen

Btw, with Rcpp=1.0.12, the above code reported:

start, array, arma before A[1, 1] = 0, A[M, N] = 0 after A[1, 1] = 1, A[M, N] = 0 not worked
start, array, rcpp before A[1, 1] = 0, A[M, N] = 0 after A[1, 1] = 1, A[M, N] = 1 worked
start, array, eigen before A[1, 1] = 0, A[M, N] = 0 after A[1, 1] = 1, A[M, N] = 1 worked
start, matrix, arma before A[1, 1] = 0, A[M, N] = 0 after A[1, 1] = 1, A[M, N] = 0 not worked
start, matrix, rcpp before A[1, 1] = 0, A[M, N] = 0 after A[1, 1] = 1, A[M, N] = 1 worked
start, matrix, eigen before A[1, 1] = 0, A[M, N] = 0 after A[1, 1] = 1, A[M, N] = 1 worked
> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Copenhagen
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.0           RcppEigen_0.3.4.0.0      tools_4.4.0              RcppArmadillo_0.12.8.4.0 Rcpp_1.0.12             
eddelbuettel commented 1 month ago

Thanks for taking the time to write up a report. However, this has been clearly documented for many years so no there is no expectation to change it. I think we should just close this.

PS One of the places where it is discussed is Question 5.1 in the Rcpp FAQ.

Zilong-Li commented 1 month ago

All good. Just so people are aware.

eddelbuettel commented 1 month ago

The underlying SEXP is a pointer so this is hard to close. We did it, if memory serves, to some extent in RcppArmadillo where you can declare a signature arma_test(const arma:::mat& m) at which point the compiler stops you. But it's a lot of work to carry it across the other two packages, plus the interface is 'out there' as it is so ... status quo remains.

Enchufa2 commented 1 month ago

Mmmh, my understanding is that @Zilong-Li does want the side effect to happen. The issue is that not all the elements in the 1e6 x 1e4 input matrix are being filled with 1's. More specifically, only 1410065408 elements are filled with 1's, i.e. a 14%, whereas the previous version filled 100%. Maybe it has something to do with the size_t change we did?

Enchufa2 commented 1 month ago

Here's a reprex that doesn't involve any side effect. I get:

Rcpp::cppFunction("
Rcpp::NumericMatrix test_rcpp(double M, double N) {
  Rcpp::NumericMatrix A(M, N);
  A.fill(1);
  return A;
}")

sum(test_rcpp(1e5, 1e4))
#> [1] 1e+09
sum(test_rcpp(1e6, 1e4)) # requires 80 GB of memory
#> [1] 1410065408
Enchufa2 commented 1 month ago

And removing this static_cast solves the issue here.

@eddelbuettel Do you remember why it was needed?

eddelbuettel commented 1 month ago

My apologies. I completely misread the bug report about being on the old, known side effect when it is not.

And I verified that this passes on 1.0.12, and fails on 1.0.13. The RcppArmadillo issue is seperate (but to be looked at too) and RcppEigen seems to work.

eddelbuettel commented 1 month ago

I think we had compiler warnings there. Which no longer appear but now we seem to be smaller int size only. From glancing at it it seems to be #1307.

eddelbuettel commented 1 month ago

So the better fix, it seems, would have been

modified   inst/include/Rcpp/vector/Vector.h
@@ -331,7 +331,7 @@ public:
     }

     inline iterator begin() { return cache.get() ; }
-    inline iterator end() { return cache.get() + static_cast<int>(size()) ; }
+    inline iterator end() { return cache.get() + static_cast<R_xlen_t>(size()) ; }
     inline const_iterator begin() const{ return cache.get_const() ; }
     inline const_iterator end() const{ return cache.get_const() + size() ; }
     inline const_iterator cbegin() const{ return cache.get_const() ; }

It passes with it, and I am not getting compilation warnings. I still have the same stanza in ~/.R/Makevars (comment-out, now re-activated) (and also re-using CCACHE etc defined above)

CLANGVER=-17
# #CLANGLIB=-stdlib=libc++
CXX=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB)
CXX11=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB)
CXX14=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB)
CXX17=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB)
CXX20=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB)
CC=$(CCACHE) clang$(CLANGVER)
SHLIB_CXXLD=clang++$(CLANGVER) $(CLANGLIB)
CLANG_CXX_FLAGS=-Wconversion -Wno-sign-conversion #-Wno-unused-but-set-variable -Wno-delee-non-abstract-non-virtual-dtor
#CLANG_FLAGS=-Wconversion -Wno-sign-conversion -Wno-float-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-implicit-float-conversion #-Wno-unused-but-set-variable -Wno-delee-non-abstract-non-virtual-dtor
CXXFLAGS=-Wall -O3 -pedantic $(CLANG_CXX_FLAGS)
CXX11FLAGS=-Wall -O3 -pedantic $(CLANG_CXX_FLAGS)
CXX14FLAGS=-Wall -O3 -pedantic $(CLANG_CXX_FLAGS)
CXX17FLAGS=-Wall -O3 -pedantic $(CLANG_CXX_FLAGS)
CXX20FLAGS=-Wall -O3 -pedantic $(CLANG_CXX_FLAGS)
Zilong-Li commented 1 month ago

Thanks for looking into it!

Few R projects I contributed to failed to pass CI when using Rcpp==1.0.13. I was too lazy to look into them, and simply disabled Rcpp==1.0.13 in the conda envrionment. Once you fixed it, I'll re-try Rcpp=1.0.13.

eddelbuettel commented 1 month ago

@Zilong-Li It's a really helpful bug report, but I got distracted by some aspects. Maybe next time stress the preferred aspect of a minimally complete reproducible example: returning the large matrix and showing sum() not being equal to the number of elements would have been more concise. Which helps readers like myself with attention deficit disorder :wink:

eddelbuettel commented 1 month ago

R package development happens mostly outside of Conda, and you will be able to get a new minor release from the Rcpp drat and/or the r-universe builds soon. Because Rcpp has such a tail of dependencies we may wait until the scheduled 1.0.14 release in January to get this to CRAN.

eddelbuettel commented 1 month ago

I also double-checked RcppArmadillo. We're good as long as ARMA_64BIT_WORD is defined (which we may want to consider as a default now):

Code


#define ARMA_64BIT_WORD 1
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::mat test_arma(int n, int k) { 
    arma::mat m(n,k); 
    m.fill(1); 
    return m; 
}

/*** R
sum(test_arma(1e6,1e4))
*/

Demo

$ Rscript -e 'Rcpp::sourceCpp("armacheck.cpp")'

> sum(test_arma(1e+06, 10000))
[1] 1e+10
$