JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

Issue with number of mentions on Topics view #80

Closed mastafaMicrosoft closed 3 years ago

mastafaMicrosoft commented 3 years ago

Hey,

There is an issue with the number of mentions when the user assigns several times the same keyword to a Topic.

For example, let's assume the dictionary of topics/keywords is: {"Democrats": "democratic", "democratic"} then with current implementation it seems like the mentions will count twice as much they should.

Will follow up with an example. But what we can do is just to take the unique list of keywords in the backend to make sure we don't give more importance to duplicates.

Mastafa,

JasonKessler commented 3 years ago

I'm unable to duplicate this. Could you provide a runnable example on a public dataset, and a topic's expected/actual frequency?

Until then, I'm closing this issue.