RcppCore / RcppArmadillo

Rcpp integration for the Armadillo templated linear algebra library
193 stars 56 forks source link

[8.100] Thread-safety of the read access to elements of sparse matrix #179

Closed dselivanov closed 7 years ago

dselivanov commented 7 years ago

I have this chunk of code where I read elements of arma::sp_mat sparse matrix from many threads. With Armadillo 7.* series it worked fine, with latest 8.100 it crashes with some weird traceback. Any thoughts?

longjmp causes uninitialized stack frame : /usr/lib64/R/bin/exec/R terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fb6a6844d87] /lib64/libc.so.6(+0x10fcad)[0x7fb6a6844cad] /lib64/libc.so.6(longjmp_chk+0x29)[0x7fb6a6844c09] /usr/lib64/R/lib/libR.so(+0x2111c6)[0x7fb6a961b1c6] /lib64/libc.so.6(+0x35270)[0x7fb6a676a270] /lib64/libpthread.so.0(+0xe42b)[0x7fb6a6b0642b] /lib64/libpthread.so.0(+0x9dcb)[0x7fb6a6b01dcb] /lib64/libpthread.so.0(pthread_mutex_lock+0x68)[0x7fb6a6b01c98] /usr/lib64/libjemalloc.so.1(+0x267c2)[0x7fb6a9a587c2] /usr/lib64/libjemalloc.so.1(free+0x350)[0x7fb6a9a36640] /home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_ZNSt8_Rb_treeIySt4pairIKydESt10_Select1stIS2_ESt4lessIyESaIS2_EE8_M_eraseEPSt13_Rb_tree_nodeIS2_E+0x13a)[0x7fb68b7242aa] /home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_ZN4arma6MapMatIdEaSERKNS_5SpMatIdEE+0x10f)[0x7fb68b73afdf] /home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(+0x306fb)[0x7fb68b7396fb] /home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_Z13dotprod_top_kRKN4arma3MatIdEES3_jjRN4Rcpp8NullableIKNS_5SpMatIdEEEE+0x4f3)[0x7fb68b73a5d3] /home/dselivanov/R/x86_64-redhat-linux-gnu-library/3.4/reco/libs/reco.so(_reco_dotprod_top_k+0x2b5)[0x7fb68b71e095] /usr/lib64/R/lib/libR.so(+0xdcedd)[0x7fb6a94e6edd] /usr/lib64/R/lib/libR.so(+0x1106bd)[0x7fb6a951a6bd] /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fb6a952a218] /usr/lib64/R/lib/libR.so(+0x12228f)[0x7fb6a952c28f] /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fb6a952a3d4] /usr/lib64/R/lib/libR.so(+0x12409e)[0x7fb6a952e09e] /usr/lib64/R/lib/libR.so(Rf_eval+0x585)[0x7fb6a952a605] /usr/lib64/R/lib/libR.so(Rf_ReplIteration+0x232)[0x7fb6a9553e42] /usr/lib64/R/lib/libR.so(+0x14a221)[0x7fb6a9554221] /usr/lib64/R/lib/libR.so(run_Rmainloop+0x4f)[0x7fb6a95542df] /usr/lib64/R/bin/exec/R(main+0x1b)[0x40080b] /lib64/libc.so.6(libc_start_main+0xf5)[0x7fb6a6756c05]

dselivanov commented 7 years ago

Thanks, tested, problem gone.

eddelbuettel commented 7 years ago

Good report, and helpful and very timely answer.

Any idea how we could document the need for const better on our side? Time to start a FAQ vignette for RcppArmadillo just as we do for Rcpp?

coatless commented 7 years ago

'Tis always a good time to start an FAQ ;-)

eddelbuettel commented 7 years ago

Point 1: Don't buy a mac.

Point 2: Seriously, just don't.

Point 3: Reference to

dselivanov commented 7 years ago

Unfortunately I'm going to reopen this ticket because of 2 problems for random access to sparse matrix elements (not sure what happened since last time, but I remember everything worked fine!):

  1. Significant performance degradation for single thread (0.7.960 -> 0.8.100) - 10x and more
  2. Crash of R session when multiple threads try to have read access (0.8.100)

Minimal reproducible example on gist and here:

#include <RcppArmadillo.h>
#include <queue>
#include <iostream>
#include <vector>
#ifdef _OPENMP
#include <omp.h>
#endif

#define GRAIN_SIZE 10

using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double test_spmat(const arma::sp_mat &x, IntegerVector I, IntegerVector J, int n_threads) {
  int *i_ptr = I.begin();
  int *j_ptr = J.begin();

  double sum = 0;
  #ifdef _OPENMP
  #pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
  #endif
  for(int k = 0; k < J.size(); k++) {
    //adjust to 0-based indexes
    int i = i_ptr[k] - 1;
    int j = j_ptr[k] - 1;
    sum += x(i, j);
  }
  return(sum);
}
library(Rcpp)
library(Matrix)

n = 100000
m = 10000
nnz = 0.001 * n * m
set.seed(1)

x = sparseMatrix(i = sample(n, nnz, T), j = sample(m, nnz, T), x = 1, dims = c(n, m))
i = sample(n, nnz * 10, T)
j = sample(m, nnz * 10, T) 

install.packages("~/Downloads/RcppArmadillo_0.7.960.1.2.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)

system.time(temp <- test_spmat(x, i, j, 1))
# user  system elapsed 
# 0.568   0.003   0.572
temp
# 9830
system.time(temp <- test_spmat(x, i, j, 4))
# user  system elapsed 
# 0.636   0.004   0.164 
temp
# 9830

install.packages("~/Downloads/RcppArmadillo_0.8.100.1.0.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)

system.time(temp <- test_spmat(x, i, j, 1))
# user  system elapsed 
# 6.199   0.037   6.253 
temp
# 9830

# this one crash R session
system.time(temp <- test_spmat(x, i, j, 4))
eddelbuettel commented 7 years ago

Can you force a deep copy via clone() or an alike into a new non-R allocated variable, and then try again?

That would essentially be the lesson from RcppParallel -- any R-owned memory object may get gc()-ed.

dselivanov commented 7 years ago

Same crash with const arma::sp_mat x2 = arma::sp_mat(x). And this doesn't explain slow-down... so my guess is that this is not related to R's gc(). I have feeling that something goes wrong in Armadillo/RcppArmadillo.

PS I realize that in general element-by-element access to sparse matrix should be avoided, but in my case according to benchmark it wasn't bottleneck (initially I've panned to convert in to hash map of triplets).

eddelbuettel commented 7 years ago

I can't help you here. There is nothing as far as I can see that the package does to get in the way.

"If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care.

I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad.

dselivanov commented 7 years ago

I will try to narrow down the problem and create pure c++ example.

8 нояб. 2017 г. 17:23 пользователь "Dirk Eddelbuettel" < notifications@github.com> написал:

I can't help you here. There is nothing as far as I can see that the package does to get in the way.

"If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care.

I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/RcppCore/RcppArmadillo/issues/179#issuecomment-342815869, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3SiTmiW4HrCOwBbM8MLG8xYO5FgRks5s0as4gaJpZM4P2lQ6 .

eddelbuettel commented 7 years ago

I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way?

dselivanov commented 7 years ago

In a package I have PKG_CXXFLAGS = -DARMA_DONT_USE_OPENMP

8 нояб. 2017 г. 19:54 пользователь "Dirk Eddelbuettel" < notifications@github.com> написал:

I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/RcppCore/RcppArmadillo/issues/179#issuecomment-342860809, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3R79YwyAr-zajk5dGJFNfhg7smmTks5s0c6ggaJpZM4P2lQ6 .

kevinushey commented 7 years ago

FWIW it all runs fine on my macOS machine, with clang-5.0 + sanitizers. I do have OpenMP disabled though, and I'm not using or linking to the R-LLVM toolchain.

> system.time(temp <- test_spmat(x, i, j, 4))
   user  system elapsed 
  2.043   0.017   2.072 
> sessionInfo() R Under development (unstable) (2017-10-12 r73548) Platform: x86_64-apple-darwin17.0.0 (64-bit) Running under: macOS High Sierra 10.13.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /usr/local/Cellar/lapack/3.7.1/lib/liblapack.3.7.1.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Matrix_1.2-11 Rcpp_0.12.13.2 testthat_1.0.2.9000 rmarkdown_1.6 knitr_1.17 roxygen2_6.0.1 devtools_1.13.3 loaded via a namespace (and not attached): [1] xml2_1.1.1 magrittr_1.5 lattice_0.20-35 R6_2.2.2 RcppArmadillo_0.8.200.1.1 [6] rlang_0.1.2 stringr_1.2.0 tools_3.5.0 grid_3.5.0 withr_2.0.0 [11] htmltools_0.3.6 commonmark_1.4 yaml_2.1.14 rprojroot_1.2 digest_0.6.12 [16] crayon_1.3.4 memoise_1.1.0 evaluate_0.10.1 stringi_1.1.5 compiler_3.5.0 [21] backports_1.1.1
dselivanov commented 7 years ago

I think that problem is in Armadillo 0.8.100 because code works fine with latest 0.8.200. Here is pure c++ code which works fine with arma 8.200 and produce segmentation fault with 0.8.100

#include "include/armadillo"
#include <stdio.h>
#define GRAIN_SIZE 10

int main() {
  int n_threads = 4;
  int n = 1000000;
  int m = 10000;
  double nnz_prop = 0.001;
  int nnz = n * m * nnz_prop;
  const arma::sp_mat x = arma::sprandu(n, m, nnz_prop);
  arma::ivec i = arma::randi<arma::ivec>(nnz, arma::distr_param(0, n - 1));
  arma::ivec j = arma::randi<arma::ivec>(nnz, arma::distr_param(0, m - 1));

  double sum = 0;
  #pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
  for(int k = 0; k < nnz; k++) {
    sum += x.at(i[k], j[k]);
  }
  printf("%f\n", sum);
  return(0);
}

But it is still ~ 5-10x slower than 0.7.960

FYI @conradsnicta

eddelbuettel commented 7 years ago

That can happen. I'll get to 0.8.200.* when I have a moment.

eddelbuettel commented 7 years ago

@dselivanov I just pushed a new branch with 0.8.200.2.0 -- untested as of now -- but with small changes. I may have time to put it through the test harness tomorrow and then merge to master.

Feel free to experiment in the interim.

dselivanov commented 7 years ago

@eddelbuettel thank you, I can confirm that code works fine with 0.8.200.2.0 branch So I think issue related to RcppArmadillo can be closed.

@conradsnicta I checked on 2 different code chunks:

From last pure C++ example:

On my system (OS X) and single thread: g++-7:

# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
real    0m7.398s
user    0m7.276s
sys 0m0.114s

# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
real    0m7.206s
user    0m7.087s
sys 0m0.114s

clang 4:

# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
real    0m9.685s
user    0m9.383s
sys 0m0.295s

# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
real    0m5.773s
user    0m5.647s
sys 0m0.118s

From my initial example

clang 4:

# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
1 CORE
user  system elapsed 
5.838   0.050   5.892 

4 CORE
user  system elapsed 
6.223   0.052   5.573

# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
1 CORE
user  system elapsed 
0.677   0.004   0.682 

4 CORE
user  system elapsed 
1.842   0.006   0.466 

@conradsnicta I'm not sure why difference is so huge in second case. I can provide sparse matrix in market matrix triplet format and subsetting indices if it can help. How can I help in investigation?

dselivanov commented 7 years ago

But for certain cases single thread performance also suffer. I will prepare small self contained example and test on my Ubuntu machine.

9 нояб. 2017 г. 18:33 пользователь "Conrad Sanderson" < notifications@github.com> написал:

@dselivanov https://github.com/dselivanov - I don't know what would be causing this. It works properly under gcc, so I suspect it's an issue with the openmp implementation in clang and/or macOS. Apple has been a bit iffy about providing openmp as part of the standard compiler on macOS, which would also suggest that either openmp in clang and/or its interaction with macOS is problematic.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RcppCore/RcppArmadillo/issues/179#issuecomment-343172019, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3cF6zmQ82ccIT-d8gAFbaMwU2yapks5s0w0igaJpZM4P2lQ6 .

eddelbuettel commented 7 years ago

The 0.8.200.2.0 release candidate looks good otherwise and I will merge that into master later, and probably prepare a drat release too.

eddelbuettel commented 7 years ago

FWIW the 0.8.200.2.0 tarball is now in the drat repo of the RcppCore organization so you can install it via your preferred R package tool just like prior pre-releases.