Closed dselivanov closed 7 years ago
Thanks, tested, problem gone.
Good report, and helpful and very timely answer.
Any idea how we could document the need for const
better on our side? Time to start a FAQ vignette for RcppArmadillo just as we do for Rcpp?
'Tis always a good time to start an FAQ ;-)
Point 1: Don't buy a mac.
Point 2: Seriously, just don't.
Point 3: Reference to
Unfortunately I'm going to reopen this ticket because of 2 problems for random access to sparse matrix elements (not sure what happened since last time, but I remember everything worked fine!):
Minimal reproducible example on gist and here:
#include <RcppArmadillo.h>
#include <queue>
#include <iostream>
#include <vector>
#ifdef _OPENMP
#include <omp.h>
#endif
#define GRAIN_SIZE 10
using namespace Rcpp;
using namespace RcppArmadillo;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double test_spmat(const arma::sp_mat &x, IntegerVector I, IntegerVector J, int n_threads) {
int *i_ptr = I.begin();
int *j_ptr = J.begin();
double sum = 0;
#ifdef _OPENMP
#pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
#endif
for(int k = 0; k < J.size(); k++) {
//adjust to 0-based indexes
int i = i_ptr[k] - 1;
int j = j_ptr[k] - 1;
sum += x(i, j);
}
return(sum);
}
library(Rcpp)
library(Matrix)
n = 100000
m = 10000
nnz = 0.001 * n * m
set.seed(1)
x = sparseMatrix(i = sample(n, nnz, T), j = sample(m, nnz, T), x = 1, dims = c(n, m))
i = sample(n, nnz * 10, T)
j = sample(m, nnz * 10, T)
install.packages("~/Downloads/RcppArmadillo_0.7.960.1.2.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)
system.time(temp <- test_spmat(x, i, j, 1))
# user system elapsed
# 0.568 0.003 0.572
temp
# 9830
system.time(temp <- test_spmat(x, i, j, 4))
# user system elapsed
# 0.636 0.004 0.164
temp
# 9830
install.packages("~/Downloads/RcppArmadillo_0.8.100.1.0.tar.gz", repos = NULL, type = "source")
sourceCpp("~/Downloads/tst-arma.cpp", rebuild = T)
system.time(temp <- test_spmat(x, i, j, 1))
# user system elapsed
# 6.199 0.037 6.253
temp
# 9830
# this one crash R session
system.time(temp <- test_spmat(x, i, j, 4))
Can you force a deep copy via clone()
or an alike into a new non-R allocated variable, and then try again?
That would essentially be the lesson from RcppParallel
-- any R-owned memory object may get gc()
-ed.
Same crash with const arma::sp_mat x2 = arma::sp_mat(x)
. And this doesn't explain slow-down... so my guess is that this is not related to R's gc(). I have feeling that something goes wrong in Armadillo/RcppArmadillo.
PS I realize that in general element-by-element access to sparse matrix should be avoided, but in my case according to benchmark it wasn't bottleneck (initially I've panned to convert in to hash map of triplets).
I can't help you here. There is nothing as far as I can see that the package does to get in the way.
"If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care.
I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad.
I will try to narrow down the problem and create pure c++ example.
8 нояб. 2017 г. 17:23 пользователь "Dirk Eddelbuettel" < notifications@github.com> написал:
I can't help you here. There is nothing as far as I can see that the package does to get in the way.
"If it doesn't work, it doesn't work." Use an older (Rcpp)Armadillo or do something. Multithreading and R require a lot of care.
I suggest we close this, and I would propose you work out if a plain C++ example (no R) also crashes. In which case you need to talk Conrad.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/RcppCore/RcppArmadillo/issues/179#issuecomment-342815869, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3SiTmiW4HrCOwBbM8MLG8xYO5FgRks5s0as4gaJpZM4P2lQ6 .
I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way?
In a package I have PKG_CXXFLAGS = -DARMA_DONT_USE_OPENMP
8 нояб. 2017 г. 19:54 пользователь "Dirk Eddelbuettel" < notifications@github.com> написал:
I was also thinking that ... maybe the fact that you use threading, and that Conrad switched to more OpenMP use can get into each others way?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/RcppCore/RcppArmadillo/issues/179#issuecomment-342860809, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3R79YwyAr-zajk5dGJFNfhg7smmTks5s0c6ggaJpZM4P2lQ6 .
FWIW it all runs fine on my macOS machine, with clang-5.0 + sanitizers. I do have OpenMP disabled though, and I'm not using or linking to the R-LLVM toolchain.
> system.time(temp <- test_spmat(x, i, j, 4))
user system elapsed
2.043 0.017 2.072
I think that problem is in Armadillo 0.8.100 because code works fine with latest 0.8.200. Here is pure c++ code which works fine with arma 8.200 and produce segmentation fault with 0.8.100
#include "include/armadillo"
#include <stdio.h>
#define GRAIN_SIZE 10
int main() {
int n_threads = 4;
int n = 1000000;
int m = 10000;
double nnz_prop = 0.001;
int nnz = n * m * nnz_prop;
const arma::sp_mat x = arma::sprandu(n, m, nnz_prop);
arma::ivec i = arma::randi<arma::ivec>(nnz, arma::distr_param(0, n - 1));
arma::ivec j = arma::randi<arma::ivec>(nnz, arma::distr_param(0, m - 1));
double sum = 0;
#pragma omp parallel for num_threads(n_threads) schedule(dynamic, GRAIN_SIZE) reduction(+:sum)
for(int k = 0; k < nnz; k++) {
sum += x.at(i[k], j[k]);
}
printf("%f\n", sum);
return(0);
}
But it is still ~ 5-10x slower than 0.7.960
FYI @conradsnicta
That can happen. I'll get to 0.8.200.* when I have a moment.
@dselivanov I just pushed a new branch with 0.8.200.2.0 -- untested as of now -- but with small changes. I may have time to put it through the test harness tomorrow and then merge to master.
Feel free to experiment in the interim.
@eddelbuettel thank you, I can confirm that code works fine with 0.8.200.2.0 branch So I think issue related to RcppArmadillo can be closed.
@conradsnicta I checked on 2 different code chunks:
On my system (OS X) and single thread: g++-7:
# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
real 0m7.398s
user 0m7.276s
sys 0m0.114s
# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
real 0m7.206s
user 0m7.087s
sys 0m0.114s
clang 4:
# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
real 0m9.685s
user 0m9.383s
sys 0m0.295s
# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
real 0m5.773s
user 0m5.647s
sys 0m0.118s
clang 4:
# 8.200.2 (43c38c2e99952aaa99fba2c27fa0e06795de1f7c)
1 CORE
user system elapsed
5.838 0.050 5.892
4 CORE
user system elapsed
6.223 0.052 5.573
# 7.960 (057c1e84a64476f2cb9388733e0a4972c904614e)
1 CORE
user system elapsed
0.677 0.004 0.682
4 CORE
user system elapsed
1.842 0.006 0.466
@conradsnicta I'm not sure why difference is so huge in second case. I can provide sparse matrix in market matrix triplet format and subsetting indices if it can help. How can I help in investigation?
But for certain cases single thread performance also suffer. I will prepare small self contained example and test on my Ubuntu machine.
9 нояб. 2017 г. 18:33 пользователь "Conrad Sanderson" < notifications@github.com> написал:
@dselivanov https://github.com/dselivanov - I don't know what would be causing this. It works properly under gcc, so I suspect it's an issue with the openmp implementation in clang and/or macOS. Apple has been a bit iffy about providing openmp as part of the standard compiler on macOS, which would also suggest that either openmp in clang and/or its interaction with macOS is problematic.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RcppCore/RcppArmadillo/issues/179#issuecomment-343172019, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3cF6zmQ82ccIT-d8gAFbaMwU2yapks5s0w0igaJpZM4P2lQ6 .
The 0.8.200.2.0 release candidate looks good otherwise and I will merge that into master later, and probably prepare a drat release too.
FWIW the 0.8.200.2.0 tarball is now in the drat
repo of the RcppCore organization so you can install it via your preferred R package tool just like prior pre-releases.
I have this chunk of code where I read elements of
arma::sp_mat
sparse matrix from many threads. With Armadillo 7.* series it worked fine, with latest 8.100 it crashes with some weird traceback. Any thoughts?