JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

# of mentions displayed is off #100

Closed LeoPAllen closed 3 years ago

LeoPAllen commented 3 years ago

I'm building a scattertext document based on fairly small dataset (~100 responses). Everything seems to be working, except the mention count ("Not found in any " or "Some of the N mentions:...") is clearly incorrect. Any idea how I can debug the issue? I've investigated the corpus and nothing about the metadata (corpus.get_metadata_freq_df('')) seems off. When I try to do corpus.get_term_count_df(), the method call throws back a value error: ValueError: arrays must all be same length.

The number of mentions explicitly indicated by the scattertext document does not agree with the number of mentions that that actually appear when I search for a specific term ( Screen Shot 2021-05-21 at 4 59 37 PM

The data is sensitive so I'd prefer not to expose the text in my screenshot // share the code explicitly.

Environment

JasonKessler commented 3 years ago

I'd recommend making sure you're using the latest version of Scattertext.

If that doesn't solve your issue, please include both runnable code and a data set that replicates this term miscount along with an example of what you'd expected to see for a given term and what's happening.

JasonKessler commented 3 years ago

Closing due to inactivity