# of mentions displayed is off

LeoPAllen commented 3 years ago

I'm building a scattertext document based on fairly small dataset (~100 responses). Everything seems to be working, except the mention count ("Not found in any " or "Some of the N mentions:...") is clearly incorrect. Any idea how I can debug the issue? I've investigated the corpus and nothing about the metadata (corpus.get_metadata_freq_df('')) seems off. When I try to do corpus.get_term_count_df(), the method call throws back a value error: ValueError: arrays must all be same length.

The number of mentions explicitly indicated by the scattertext document does not agree with the number of mentions that that actually appear when I search for a specific term ( Screen Shot 2021-05-21 at 4 59 37 PM

The data is sensitive so I'd prefer not to expose the text in my screenshot // share the code explicitly.

Environment

Scattertext version: '0.0.2.71':
OS (OSX):
How you installed Scattertext (conda):
Build command you used (if compiling from source):
Python version: 3.9

JasonKessler commented 3 years ago

I'd recommend making sure you're using the latest version of Scattertext.

If that doesn't solve your issue, please include both runnable code and a data set that replicates this term miscount along with an example of what you'd expected to see for a given term and what's happening.

JasonKessler commented 3 years ago

Closing due to inactivity

JasonKessler / scattertext

# of mentions displayed is off #100

Environment