This branch implements a quick fix for calculating the widths of the red bars. Here, we calculate the term frequencies internally within createJSON() rather than using the user-supplied term.frequency.
The details are described in Issue #32
The good news is that the red bar widths are correct now. The bad news is that the blue bar widths, representing the overall frequencies of words, are not necessarily correct. They won't match the actual term frequencies provided by the user. Most of the differences are small, but this is a lingering issue. As mentioned in Issue #32 the solution may be to require the user to specify the priors that he/she used in fitting the model, so that the red and blue bars can properly account for the influence of the priors as well as the raw data itself. This would require a few additional calculations, and so far it only works when I've fit the model using the collapsed Gibbs sampler -- it failed to correctly visualize a model fit using gensim, which implements variational Bayes to fit the LDA model. I'm not sure if this is the source of the error, or just a coincidence.
Anyway, in this branch I've also updated the vignette called "details" with a similar explanation as written here.
I think this is an improvement on the previous version, so we should merge it, and maybe a bit down the line we can solve the problem for good.
This branch implements a quick fix for calculating the widths of the red bars. Here, we calculate the term frequencies internally within
createJSON()
rather than using the user-suppliedterm.frequency
.The details are described in Issue #32
The good news is that the red bar widths are correct now. The bad news is that the blue bar widths, representing the overall frequencies of words, are not necessarily correct. They won't match the actual term frequencies provided by the user. Most of the differences are small, but this is a lingering issue. As mentioned in Issue #32 the solution may be to require the user to specify the priors that he/she used in fitting the model, so that the red and blue bars can properly account for the influence of the priors as well as the raw data itself. This would require a few additional calculations, and so far it only works when I've fit the model using the collapsed Gibbs sampler -- it failed to correctly visualize a model fit using gensim, which implements variational Bayes to fit the LDA model. I'm not sure if this is the source of the error, or just a coincidence.
Anyway, in this branch I've also updated the vignette called "details" with a similar explanation as written here.
I think this is an improvement on the previous version, so we should merge it, and maybe a bit down the line we can solve the problem for good.