JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

How to list the words according to the color or the range of the scores? #107

Closed Lipahak closed 2 years ago

Lipahak commented 2 years ago
# Here is the code snippet
import scattertext as st
from pprint import pprint
from scattertext import SampleCorpora, PhraseMachinePhrases, dense_rank, RankDifference, AssociationCompactor, produce_scattertext_explorer
from scattertext.CorpusFromPandas import CorpusFromPandas
import spacy
nlp = spacy.load("en_core_web_sm")

# corpus
corpus = st.CorpusFromPandas(df,   #############edit              
                            category_col='tag', 
                            text_col='text',
                            nlp=nlp).build()
# textscatter plot
path1="xxx"
html = st.produce_scattertext_explorer(corpus,
          category='Y',
          category_name='A',
          not_category_name='B',
          width_in_pixels=1000,
          metadata=df['text_remove'])    ##############edit

Expected behavior

In order to tell the difference in group A and B, I'd like to list the words by the positive and negative class (in blue and red). Is it possible to list those words according to the color or the scores? Thank you so much for answering.

JasonKessler commented 2 years ago

The colors correspond to terms scores. Please add these questions as discussion posts and not issues.