daqana / dqrng

Fast Pseudo Random Number Generators for R
https://daqana.github.io/dqrng/
Other
42 stars 8 forks source link

How to set random seed using RcppParallel or OpenMP to be repeatable #83

Closed xiangpin closed 6 months ago

xiangpin commented 6 months ago

Good Day

Thanks for developing the package, I using shuffle of Armadillo with RcppParallel or OpenMP , but the result is not repeatable. I've been searching on Google for some time yet haven't come across a solution. someone suggest this package, but I can't find the solution. Could you help me? thank you.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <omp.h>

using namespace arma;

arma::rowvec calkld(arma::rowvec z, arma::rowvec bg){
    arma::rowvec x = z % log(z / bg);
    return x;

}

arma::rowvec calmeankld(arma::rowvec x, arma::rowvec bg, int permutation = 100){

    arma::vec bootkld(permutation);

    for (int j = 0; j < permutation; j++){
        arma::rowvec z = arma::shuffle(x);
        arma::rowvec k = calkld(z, bg);
        bootkld[j] = log(sum(k));
    }

    double bmean = arma::mean(bootkld);
    double bsd = arma::stddev(bootkld);
    arma::rowvec res = {bmean, bsd};

    return res;
}

//[[Rcpp::export]]
arma::mat testparallel(arma::mat m, int permutation = 100){
  m = m + 1e-300;
  int n = m.n_rows;
  arma::mat out(m.n_rows, 2);

  arma::rowvec bg = mean(m, 0);
  bg = bg + 1e-300;

  // Parallelize the Loop
  #pragma omp parallel for schedule(static)
  for (unsigned int i = 0; i < n; i++){
    out.row(i) = calmeankld(m.row(i), bg, permutation);
  }

  return out;
}
> da <- matrix(abs(rnorm(200)), ncol=10)
> withr::with_seed(123, testparallel(da, permutation=100)) -> t1
> withr::with_seed(123, testparallel(da, permutation=100)) -> t2
> identical(t1, t2)
[1] FALSE
> RhpcBLASctl::omp_set_num_threads(1)
> withr::with_seed(123, testparallel(da, permutation=100)) -> t3
> withr::with_seed(123, testparallel(da, permutation=100)) -> t4
> identical(t3, t4)
[1] TRUE
rstub commented 6 months ago

The fundamental problem is that arma::shuffle makes use of R's RNG when used via RcppArmadillo, which must not be used from multi-threaded code. Here, dqrng can indeed help. You can make use of the second example on https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html. The idea is the following:

// [[Rcpp::depends(RcppArmadillo,dqrng)]]
#include <algorithm>
#include <RcppArmadillo.h>
#include <xoshiro.h>
#include <convert_seed.h>
#include <R_randgen.h>
#include <omp.h>

using namespace arma;

arma::rowvec calkld(arma::rowvec z, arma::rowvec bg){
  arma::rowvec x = z % log(z / bg);
  return x;

}

arma::rowvec calmeankld(arma::rowvec x, arma::rowvec bg, dqrng::xoshiro256plus rng, int permutation = 100){

  arma::vec bootkld(permutation);

  for (int j = 0; j < permutation; j++){
    arma::rowvec z = x;
    std::shuffle(std::begin(z), std::end(z), rng);
    arma::rowvec k = calkld(z, bg);
    bootkld[j] = log(sum(k));
  }

  double bmean = arma::mean(bootkld);
  double bsd = arma::stddev(bootkld);
  arma::rowvec res = {bmean, bsd};

  return res;
}

//[[Rcpp::export]]
arma::mat testparallel(arma::mat m, int permutation = 100){
  m = m + 1e-300;
  int n = m.n_rows;
  arma::mat out(m.n_rows, 2);

  arma::rowvec bg = mean(m, 0);
  bg = bg + 1e-300;

  // seed RNG from R's RNG state
  Rcpp::IntegerVector seed(2, dqrng::R_random_int);
  dqrng::xoshiro256plus rng(dqrng::convert_seed<uint64_t>(seed));

  // Parallelize the Loop
#pragma omp parallel for schedule(static)
  for (unsigned int i = 0; i < n; i++){
    dqrng::xoshiro256plus lrng(rng);      // make thread local copy of rng
    lrng.long_jump(omp_get_thread_num() + 1);  // advance rng by 1 ... jumps
    out.row(i) = calmeankld(m.row(i), bg, lrng, permutation);
  }

  return out;
}

/*** R
da <- matrix(abs(rnorm(200)), ncol=10)
withr::with_seed(123, testparallel(da, permutation=100)) -> t1
withr::with_seed(123, testparallel(da, permutation=100)) -> t2
identical(t1, t2)
*/

Note: I am restricting myself to functionality that is available in the current CRAN release of dqrng.

xiangpin commented 6 months ago

Thank you very much, I believe I have a solution to this problem.

rstub commented 6 months ago

Great! Can you tell me what kid of documentation would have helped you finding this?

xiangpin commented 6 months ago

The codes which you modified and the idea greatly assisted me. But I want to using dqrng in an R package, how should I using it. I have added it to Linking to in DESCRIPTION, and added #<dqrng.h> in the cpp file, but it does not work.

error: ‘dqrng’ has not been declared
rstub commented 6 months ago

The code I posted should work unchanged in a package with LinkingTo: RcppArmadillo, dqrng. With the current CRAN of dqrng version, there is no point to include dqrng.h for parallel work, but if you still want to do it, you have to use #include <dqrng.h>.

xiangpin commented 6 months ago

I found the error might be introduced by exporting the function containing dqrng::xoshiro256plus rng argument.

for the example code

// [[Rcpp::export]]
arma::rowvec calmeankld(arma::rowvec x, arma::rowvec bg, dqrng::xoshiro256plus rng, int permutation = 100)

I removed the // [[Rcpp::export]], then it works

rstub commented 6 months ago

That makes sense. In your original code calmeankld() is meant to be called within the parallel code. It does not make sense to make that callable from R via // [[Rcpp::export]].

I am closing this issue now since questions seem to be answered.