TommyJones / tmsamples

R package for simulating corpora from topic model parameters.
Other
0 stars 0 forks source link

Multithreaded sampler safety #2

Open TommyJones opened 4 years ago

TommyJones commented 4 years ago
  1. Thread safe sampler?
  2. Respect's R's random seed?
TommyJones commented 4 years ago

Confirmed that this does not respect R's random seed.

Run the below a few times and get different results even with the same seed.

set.seed(1234)

Nk <- 4
Nd <- 50
Nv <- 1000
alpha <- rgamma(Nk, 0.5)

beta <- generate_zipf(vocab_size = Nv, magnitude = 500, zipf_par = 1.1)

pars <- sample_parameters(alpha, beta, Nd)

doc_lengths <- rpois(Nd, 50)

dtm <- sample_documents(
  theta = pars$theta,
  phi = pars$phi,
  doc_lengths = doc_lengths,
  threads = 7 ## threads controls parallel computation
)

colnames(dtm) <- colnames(pars$phi)

head(sort(colSums(dtm), decreasing = T))
TommyJones commented 4 years ago

Rcpp guide to seed setting Set the random seed using Rcpp Armadillo Example of parallel seed setting Different seeding algorithms R package drnq example