Open bdewilde opened 8 years ago
PyLDAVis is pretty simple w.r.t input. I came up with the following for the prepare method,
model = textacy.tm.TopicModel('lda', n_topics=30)
model.fit(doc_term_matrix)
doc_topic_matrix = model.transform(doc_term_matrix)
top_term_matrix = model.model.components_
doc_lengths = [len(d) for d in documents]
vocab = list(id2term.values())
term_frequency = textacy.vsm.get_term_freqs(doc_term_matrix)
import pyLDAvis
vis_data = pyLDAvis.prepare(
top_term_matrix,
doc_topic_matrix,
doc_lengths,
vocab,
term_frequency,
)
One thing, pyldavis does an assertion on the document topic matrix to ensure all rows sum to one. This happens for LDA, but I noticed that NMF didn't do this step, I don't know about LSA.
Hello @bdewilde - we've been working on a machine learning visualization library called Yellowbrick, to provide custom Matplotlib visualizers for Scikit-Learn estimators. The project is still young, but is growing, and we've recently added a few new features for visualization to support modeling on text. We're big fans of your work and we think the list of ideas in this issue is very interesting. Not sure if you're still interested in pursuing the text viz stuff or have moved on to other things, but let us know if you have any additional thoughts or suggestions!
textacy
currently has two visualizations:draw_semantic_network()
for visualizing documents as networks of terms with edges given by, say, term co-occurrence; anddraw_termite_plot()
for visualizing the relationship between topics and terms in a topic model. Both of these could be improved!There are also tons of other visualizations that
textacy
users could benefit from:I should stop listing these out and just point people to this site, which contains tons of possibilities.
implementation in
textacy