Closed TommyJones closed 4 years ago
Problem seems to come from some entry of phi_initial[k, v]
that's equal to zero. This is caused by a sparse dirichlet parameter when calling gtools::rdirichlet
that results in underflow.
I've reproduced this behavior with three different libraries.
set.seed(90210);
gt_dir <- gtools::rdirichlet(n = 1000, alpha = rep(0.01, 14843))
summary(rowSums(gt_dir == 0))
set.seed(90210)
mc_dir <- MCMCpack::rdirichlet(n = 1000, alpha = rep(0.01, 14843))
summary(rowSums(mc_dir == 0))
set.seed(90210)
dr_dir <- DirichletReg::rdirichlet(n = 1000, alpha = rep(0.01, 14843))
summary(rowSums(dr_dir == 0))
possible patch is to add .Machine$double.eps
to each draw. Will explore...
Unit test:
m <- tidylda(
dtm = textmineR::nih_sample_dtm,
k = 10,
iterations = 20,
burnin = 15,
alpha = 0.05,
beta = 0.01,
optimize_alpha = FALSE,
calc_likelihood = TRUE,
calc_r2 = FALSE,
return_data = FALSE
)
The above fails, specifically because beta
is too sparse.
Fixed by adding machine epsilon to dirichlet draws for initialization: a42d38e531cb0324151c51aacea506088f9b645e
Seems like sparsity may be an issue. But I need to track down what's causing this.
While I'm at it, good to track down a reprex.