cpsievert / LDAvis

R package for web-based interactive topic model visualization.
Other
557 stars 131 forks source link

Quick fix for calculation of widths of red bars #41

Closed kshirley closed 9 years ago

kshirley commented 9 years ago

This branch implements a quick fix for calculating the widths of the red bars. Here, we calculate the term frequencies internally within createJSON() rather than using the user-supplied term.frequency.

The details are described in Issue #32

The good news is that the red bar widths are correct now. The bad news is that the blue bar widths, representing the overall frequencies of words, are not necessarily correct. They won't match the actual term frequencies provided by the user. Most of the differences are small, but this is a lingering issue. As mentioned in Issue #32 the solution may be to require the user to specify the priors that he/she used in fitting the model, so that the red and blue bars can properly account for the influence of the priors as well as the raw data itself. This would require a few additional calculations, and so far it only works when I've fit the model using the collapsed Gibbs sampler -- it failed to correctly visualize a model fit using gensim, which implements variational Bayes to fit the LDA model. I'm not sure if this is the source of the error, or just a coincidence.

Anyway, in this branch I've also updated the vignette called "details" with a similar explanation as written here.

I think this is an improvement on the previous version, so we should merge it, and maybe a bit down the line we can solve the problem for good.

cpsievert commented 9 years ago

Go ahead and merge if you like @kshirley :+1: