cpsievert / LDAvis

R package for web-based interactive topic model visualization.
Other
557 stars 131 forks source link

New #19

Closed cpsievert closed 9 years ago

cpsievert commented 10 years ago

I've noticed a couple minor problems (when checked, these are no longer a problem):

cpsievert commented 10 years ago

Commit e5ebfac has some drastic changes on the JS side, but with some benefits:

  1. By wrapping the vis into a function, global variables will no longer be floating around. Also, this makes it much easier to for users to embed LDAvis into their own page and makes it easier to write shiny bindings (if we ever want to).
  2. The new topic entry and lambda slider seems more intuitive
kshirley commented 10 years ago

Holy lambda slider! Looks very nice. That input type="range" is great.

A couple small things:

  1. Let's put back the buttons for "next topic" and "previous topic". I think right now you only see the up and down arrows when you put the mouse over the "enter topic #" input box. I think it's a bit nicer to see that these exist ahead of time (and let's make the buttons big and easy to click).
  2. For the slider, can we tell it not to change the value of lambda until it's been in place for some fraction of a second? I like the current way that the bar transitions are broken down into two parts: (i) the vertical movement, and then (ii) the horizontal movement. With the lambda slider, these movements are getting sort of smushed together. I might even like putting back the +0.1 and -0.1 buttons in case users prefer that, but we can think about that one...

Last, let's add a legend to the lambda slider that says something like "high lift" at lambda = 0, and "high probability" near lambda = 1.

Should I take a stab at these changes?

cpsievert commented 10 years ago

Feel free to make changes, but just as a warning, it might not be easy to re-implement the "next" & "previous" buttons (the new approach doesn't use onSubmit & is based on this example). It might be easier to find a way to style the input so that the arrows are more visible. As for the lambda slider, I think I see what you are referring to when you wiggle the slider back and forth, but I'm not immediately sure how to "delay" the event trigger.

In addition to the bugfixes in the checklist above, over the next few days, these additions will be higher on my priority list:

cpsievert commented 10 years ago

Note I was able to fix the lambda slider event handling by simply using the "mouseup" event trigger (instead of "input" -- see 7a589df)

cpsievert commented 10 years ago

BTW, let me know what you think about the new dynamic circle size guide.

kaneplusplus commented 10 years ago

The new version looks really good and I like how easy the example is to use. I'm noticing three things that might be worth discussing, or they may be things that are addressed and I missed it in the documentation.

  1. If I mouse over a topic and select it then the selected topic may be different than the topic number in the upper left corner.
  2. I'm not sure what is being shown in the Marginal Topic Frequency. Is there a description?
  3. Is there a way to specify the state of the vis? Let's say I find something noteworthy and I want to share it with someone else. I would potentially want them to see the visualization with a topic selected rather than a user needing to find the topic themselves.
kshirley commented 10 years ago

Thx for the thoughts, mike.

  1. Correct, and this is a bug. We'll fix so that the selected topic is always displayed by number in the box in the upper left corner. (Note: the hovered topic is not necessarily selected. Selected means that the bars will be "frozen" to the most relevant words for that topic, and this can only be done with clicking on a topic circle, or entering the topic number in the box).
  2. We'll add a description in the a vignette, I think. Basically, in LDA the topics themselves have a marginal distribution. Each token is modeled as having a latent topic, so the collection of estimated latent topics across all the tokens in the corpus determines the estimated marginal distribution over topics. Sometimes it's quite far from uniform, which can be interesting. We order topics by default in decreasing order of marginal frequency to guide users in topic interpretation.
  3. Not yet, but that's a great idea. Carlos S helped us do this with a baseball-related viz a while back, so maybe we can figure out how to implement it here.
kshirley commented 10 years ago

Hey Carson, here are some thoughts (I can tackle many of these if you want):

(1) Topic selection box:

(2) Lambda slider

(3) The topic circle scale -- this is super cool.

(4) Overall layout.

cpsievert commented 10 years ago

Turns out the width of the lambda slider wasn't getting smaller at 0 and 1, but the extra decimal place (for #'s between 0 and 1) was shifting the slider to the right, so I just decided to put the text to the right of the slider.

cpsievert commented 10 years ago

Note to self:

cpsievert commented 10 years ago

The implementation of createJSON has changed quite a bit in the last few commits, but there is quite a speed improvement (especially when using the cluster argument):

data(AP, package="LDAvis")
library("parallel")
cl <- makeCluster(detectCores()-1)
cl # socket cluster with 7 nodes on host ‘localhost’
system.time(
  json <- with(AP, createJSON(phi, theta, alpha, beta, doc.length, 
                    vocab, term.frequency, cluster = cl))
)
#   user  system elapsed 
#  1.696   0.281   4.895

I'm pretty sure all the computations are correct, but let me know if you see anything that looks wrong/weird @kshirley.

cpsievert commented 10 years ago

More TODOs:

cpsievert commented 10 years ago

Every .Rmd file under the example directory will now be automatically compiled to .html everytime we push to the repo. The movie reviews example can be seen here and the resulting vis itself is here.

cpsievert commented 9 years ago

Hey, @kshirley, do have anything else you'd like to do here? If not, I think I'm ready to merge!

kshirley commented 9 years ago

Ha - I was just writing you an email. I think there's a fair amount of documentation/vignette/data updates to do, unfortunately. I posed a few questions in my email. I'm off on other stuff right now, but will be back online tonight. k