lda-hp: memory issue in storing and passing \theta, \beta, and \z

There was a couple of R system crashes, when I ran the LDA Gibbs sampling algorithm for a large set of documents ( test_lda_c.R ). This happened at the function call that handles the C++ - R transfer of objects such as betas (K x V x G matrix), and thetas (K x D x G matrix), and Z (N x G), where K is the number of topics, V is the vocabulary size, D is the number of documents in the corpus, G is the number of saved MCMC iterations, and N is the number of word instances in the corpus.

One solution would be to keep only z values ( Z matrix ) and computing the betas and thetas on demand, i.e., when we do the computation of likelihood ratios. For example, the function

compute_thetas <- function(did, Z, K, D, base.alpha.v)

in utils.R computes thetas from the stored Z matrix.

Note: This could be an issue with the way RcppArmadillo handles _cube_ data structure and Rcpp transfer it to R environment as an _array_ data structure.

Frequency: rare

clintpgeorge / tm

lda-hp: memory issue in storing and passing \theta, \beta, and \z #1