JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

Visualization is not grouping documents by metadata value. #119

Closed Luakika closed 1 year ago

Luakika commented 1 year ago

Hi,

The html visualization is having trouble grouping documents by their metadata values. Is there a way to fix this?

Steps to Reproduce

I'm working with a pandas dataframe with 2233 rows of small document text data and a category variable with 12 values.

Run: html = st.produce_scattertext_explorer(corpus, category='positive', category_name='Positive', not_category_name='Negative', width_in_pixels=1000, metadata=corpus.get_df()['reasonFactor'])

Metadata: Name: reasonFactor, Length: 2233, dtype: category Categories (12, object): [...]

Expected behavior

Error Example: The blue values repeat categorical variables and documents instead of grouping them. scatterTextError

Correct behavior example from scattertextREADME.md scatterTextCorrect

JasonKessler commented 1 year ago

There's no functionality to group multiple documents together in the context display. The example from readme shows results different matches for the word "we" in the same document.

If you'd like Scattertext to display results differently, I'm happy to review a PR with the change.