datalab-dev / quintessence_web_app

Repository for the Quintessence Web Project applying Topic Models and Word Embeddings to EEBO-TCP
http://quintessence.ds.lib.ucdavis.edu/
0 stars 0 forks source link

add term conditional probabilities on hover of term in top terms plot #65

Closed avkoehl closed 3 years ago

avkoehl commented 3 years ago

Screenshot from https://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf Screen Shot 2021-03-18 at 3 33 43 PM

Based on my read of the paper, and the source code from pyldavis and ldavis R package. The term conditional topic probabilities is the proportion of times the word has appeared in any given topic in the corpus. The way I see it, this can be done in two ways - using the topicwords object in the lda.model object. Or by using term_topic_freq calculation as was used for the relevance slider. I will go with the second option as its what they do in ldavis.

Code would look like

topic_freq = doctopics.multiply(doclens, axis=0).sum(axis=0)
term_topic_freq = topicterms.multiply(topic_freq.values, axis=0)

topicterms should be smoothed and normalized. This would give estimate of the values needed.

Since we ultimately want to add a search bar for terms (#67) will add term_topic_freq as its own collection with all the terms, rather than just the topterms (what ldavis does)

avkoehl commented 3 years ago

this works: https://stackoverflow.com/a/47400462

avkoehl commented 3 years ago

Instead of on click. Using mouseover and mouseleave events to match ldavis.

avkoehl commented 3 years ago

Overall, I we may want to switch away from d3 event listeners. This isnt built into plotly, so has become detached from the framework and instead relies on doing everything with d3/jquery trickery.

Some problems this introduces: