JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

How to have the Scattertext without showing collocation? #106

Closed Lipahak closed 2 years ago

Lipahak commented 2 years ago

Steps to Reproduce

I have these steps to generate a Scattertext with my data:

# Here is the code snippet
import scattertext as st
from pprint import pprint
from scattertext import SampleCorpora, PhraseMachinePhrases, dense_rank, RankDifference, AssociationCompactor, produce_scattertext_explorer
from scattertext.CorpusFromPandas import CorpusFromPandas
import spacy
nlp = spacy.load("en_core_web_sm")

# corpus
corpus = st.CorpusFromPandas(df,   #############edit              
                            category_col='tag', 
                            text_col='text',
                            nlp=nlp).build()
# textscatter plot
path1="xxx"
html = st.produce_scattertext_explorer(corpus,
          category='Y',
          category_name='A',
          not_category_name='B',
          width_in_pixels=1000,
          metadata=df['text_remove'])    ##############edit

Expected behavior

There are couple of words showing in collocation. I expected words scattering word by word without collocation. How could I avoid it? Deeply appreciate your kind answer. image

Additional context

JasonKessler commented 2 years ago

Run corpus=corpus.get_unigram_corpus(). This will remove bigrams.