chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.22k stars 250 forks source link

more, better, and interactive(?) data viz #28

Open bdewilde opened 8 years ago

bdewilde commented 8 years ago

textacy currently has two visualizations: draw_semantic_network() for visualizing documents as networks of terms with edges given by, say, term co-occurrence; and draw_termite_plot() for visualizing the relationship between topics and terms in a topic model. Both of these could be improved!

There are also tons of other visualizations that textacy users could benefit from:

I should stop listing these out and just point people to this site, which contains tons of possibilities.

implementation in textacy

paul-english commented 7 years ago

PyLDAVis is pretty simple w.r.t input. I came up with the following for the prepare method,

model = textacy.tm.TopicModel('lda', n_topics=30)

model.fit(doc_term_matrix)
doc_topic_matrix = model.transform(doc_term_matrix)

top_term_matrix = model.model.components_
doc_lengths = [len(d) for d in documents]
vocab = list(id2term.values())
term_frequency = textacy.vsm.get_term_freqs(doc_term_matrix)

import pyLDAvis

vis_data = pyLDAvis.prepare(
    top_term_matrix,
    doc_topic_matrix,
    doc_lengths,
    vocab,
    term_frequency,
)

One thing, pyldavis does an assertion on the document topic matrix to ensure all rows sum to one. This happens for LDA, but I noticed that NMF didn't do this step, I don't know about LSA.

rebeccabilbro commented 7 years ago

Hello @bdewilde - we've been working on a machine learning visualization library called Yellowbrick, to provide custom Matplotlib visualizers for Scikit-Learn estimators. The project is still young, but is growing, and we've recently added a few new features for visualization to support modeling on text. We're big fans of your work and we think the list of ideas in this issue is very interesting. Not sure if you're still interested in pursuing the text viz stuff or have moved on to other things, but let us know if you have any additional thoughts or suggestions!