forcedotcom / distributions

Low-level primitives for collapsed Gibbs sampling in python and C++
BSD 3-Clause "New" or "Revised" License
33 stars 25 forks source link

sparse dirichlet discrete #83

Open stephentu opened 10 years ago

stephentu commented 10 years ago

I think Fritz and I had a conversation somewhat related to this topic, but I sort of forgot the outcome. If I have a DD where the dim is very large, and I expect the non-zero entries of the suffstats (e.g. the counts) to be very sparse, what's the right way to do this in distributions?

Essentially I want a DD where the counts[] is a Sparse<> instead of float[]. Would it be worth created a separate model which is SparseDD?

fritzo commented 10 years ago

The DPD datatype degenerates to DD when shared.beta0 = 0. In this case, you'll get a dense shared.betas analogous to DD shared.alphas, and you'll get a sparse group.counts. Would that work for you?

stephentu commented 10 years ago

Yes this will be sufficient for now. Thanks!

fritzo commented 10 years ago

A slight correction: the DPD with shared.beta0=0 is equivalent to the DD via the equation

dpd.shared.betas[i] * dpd.shared.alpha = dd.shared.alphas[i]