Using LDAvis is still a bummer

kaneplusplus commented 10 years ago

A little while I added an issue about runVis not working and code is now provided that shows how to use it. However, there are still two big difficulties with using the package:

runVis relies on global variables. Is there a good reason for this? It is generally considered bad form to assume that variable are available that aren't explicitly passed as a parameter in a function.
It's still not clear to me how to use the package, other than to show the AP example. Can you provide an example that starts with a corpus, creates an lda, and visualize the topics with runVis?

The visualization itself looks awesome and I'd like to incorporate it into a project I'm working on now with PubMed. However, the packages usability is still a big issue for me.

kshirley commented 10 years ago

Hi Mike,

Point 1 (above) only deals with the Shiny application -- you can still create a raw html visualization without any dependence on global variables in the workspace. Check out the createjson() function, for example. For the Shiny app -- agreed, we should try to avoid requiring global variables. We just didn't know how at the time. I'll defer to Carson, our Shiny expert, for further thoughts on this one.

For point 2, of course the package is designed for more than "showing the AP example" :) The key is, though, that you must fit your LDA model to your data somewhere else (using MALLET, another R package, etc.), and then our package really only takes the set of topic-term distributions as the main argument (along with the vocabulary, prevalence of each topic, and a couple other things...) and visualizes it. We need to write up an example soon -- point taken.

But overall I suppose an important thing to realize is that this package doesn't allow you to fit an LDA model, it is only for reading in the output from a fitted LDA model, and visualizing it.

kaneplusplus commented 10 years ago

Hi Kenny,

Thanks, I was not aware of createJSON. I was using a slightly dated version. I'll take a closer look.

A short write up would be really helpful. I'm having some difficulty figuring out what runVis needs and what the naming convention is when it looks in the global namespace.

Thanks, Mike

cpsievert commented 10 years ago

Hi @kaneplusplus,

Thanks for raising some very good points. I'm aware of the poor practice of using global variables to drive the application. To be honest, I wasn't aware of a better way to pass object to the shiny app. See here for a good discussion on why this is a difficult problem.

For now, I would suggest looking into createJSON. This will provide a nice alternative for those who don't want to rely on shiny. I just created a (rough) draft of a vignette using knitr and rmarkdown. See here for the html page and here for the source.

cpsievert commented 10 years ago

By the way, keep an eye on this examples folder as I plan on adding demonstrations of doing a "complete" analysis.

cpsievert commented 10 years ago

Oh, and to answer your question more directly regarding the use of runVis, you need the following objects in the global workspace: phi, term.frequency, vocab, and topic.proportion. That is the point of this line of code in the documentation (to assign elements of the Newsgroupdata list to appropriate objects in the global workspace).

Hopefully this write-up makes it a bit more clear what those things are, but I'm happy to answer more of your questions if you have them.

kaneplusplus commented 10 years ago

Awesome! Thanks for such a complete and clear response.

cpsievert / LDAvis

Using LDAvis is still a bummer #6