JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

Visualization from Token Counts or Frequencies #23

Closed morden96 closed 6 years ago

morden96 commented 6 years ago

Recommendation: MapReduce is typically used to return just word counts in large datasets. While much more simplistic than what scattertext provides, it is very fast on large datasets. However, your method of visualizing the data is stellar. It would be ideal if we could leverage scattertext_explorer using word count or frequency tables.

gabefair commented 6 years ago

Are you suggesting an ability to use a dataset that already has the word frequencies counted? Instead of using produce_scattertext_explorer to get the word count?

morden96 commented 6 years ago

Yes. Allow the user to create their own frequency or wordcounts and then leverage scattertext for the analysis/visualization. Otherwise, scattertext is unusable with large datasets, particularly ones that cannot fit in memory.

gabefair commented 6 years ago

This is exactly what I need! I've started investigating the available functions to see if this is an undocumented feature.

JasonKessler commented 6 years ago

Thanks for the feedback. Since it seems like there's a lot of demand for this, I'll put together something to make plots based on a term-category frequency dataframe.

JasonKessler commented 6 years ago

Just updated the package to include this functionality. Please see Visualizing differences based on only term frequencies.