Open polm opened 1 year ago
Thanks for pointing this out and including Scattertext in the spaCy universe. I'm preparing to deprecate the produce_scattertext_html
function, and I think it would be best if the spaCy Universe page included an example of Scattertext usage which involved more of the features available and renders a more interactive UI. For example:
import scattertext as st
import spacy
nlp = spacy.blank('en')
nlp.add_pipe('sentencizer')
df = st.SampleCorpora.ConventionData2012.get_data().assign(
parse=lambda df: df.text.apply(nlp)
)
corpus = st.CorpusFromParsedDocuments(
df,
category_col='party',
parsed_col='parse'
).build().get_stoplisted_unigram_corpus().compact(st.AssociationCompactor(2000))
html = st.produce_scattertext_explorer(
corpus,
category='democrat',
category_name='Democratic',
not_category_name='Republican',
minimum_term_frequency=0,
pmi_threshold_coefficient=0,
width_in_pixels=1000,
metadata=lambda corpus: corpus.get_df()['speaker'],
transform=st.Scalers.dense_rank
)
with open('./demo_compact.html', 'w') as of:
of.write(html)
Regardless, I'll update the package to ensure the pmi_filter_thresold
argument still works.
Ah, thanks for the info about the example! We've already merged the PR I linked to, but if you'd like to update the Universe entry we'd be happy to look at a PR any time. (That said, we're currently working on our website backend, so any updates in the immediate future won't go live for a bit.)
Thanks for working on this package. I updating the entry in the spaCy Universe (https://github.com/explosion/spaCy/pull/11937#pullrequestreview-1208010525) and we noticed the sample here uses an argument that doesn't seem to work with the latest release.
https://github.com/JasonKessler/scattertext/blob/8ddff82f670aa2ed40312b2cdd077e7f0a98a873/simple.py#L19