DavidBruant / Twitter-Assistant

Firefox add-on adding metrics to Twitter Web
MIT License
18 stars 3 forks source link

Most used words #28

Open DavidBruant opened 10 years ago

DavidBruant commented 10 years ago

Massive thanks to @gmarty.

Take a look at https://github.com/NaturalNode/natural There are tokenizers (to split a sentence into words) for multiple languages.

To detect word groups (like "social network"), look at n-grams. Also look at inflectors (singular/plural variations).

For common roots, use stemmer like https://github.com/fortnightlabs/snowball-js

DavidBruant commented 10 years ago

Maybe display like http://static4.businessinsider.com/image/4fb3df97ecad04ef32000003-610-/android-fragmentation-chart.png

DavidBruant commented 9 years ago

Depends on #68