Open clintpgeorge opened 11 years ago
I tried to avoid the passing of cube data structure. However, we need the theta and beta cubes for likelihood ratio computation. So I added the following functions to compute the same from _Z_ in utils.R
compute_thetas_betas <- function(did, wid, Z, K, D, V, base.alpha.v, base.eta)
compute_thetas <- function(did, Z, K, D, base.alpha.v)
It seems like these functions are not computationally efficient.
[TODO]: Need to find a better solution.
There was a couple of R system crashes, when I ran the LDA Gibbs sampling algorithm for a large set of documents ( test_lda_c.R ). This happened at the function call that handles the C++ - R transfer of objects such as betas (K x V x G matrix), and thetas (K x D x G matrix), and Z (N x G), where K is the number of topics, V is the vocabulary size, D is the number of documents in the corpus, G is the number of saved MCMC iterations, and N is the number of word instances in the corpus.
One solution would be to keep only z values ( Z matrix ) and computing the betas and thetas on demand, i.e., when we do the computation of likelihood ratios. For example, the function
in utils.R computes thetas from the stored Z matrix.
Note: This could be an issue with the way RcppArmadillo handles _cube_ data structure and Rcpp transfer it to R environment as an _array_ data structure.
Frequency: rare