Open braindef opened 8 years ago
Do you mean group certain words according to type, and then report on those totals?
Hello angus-c once more thank You for your tool, I use it daily.
the hacker(?) and linguist from the german hacker club ccc Josch (http://www.scharloth.com/) explains in his famous video at the 30c3 (https://www.youtube.com/watch?v=2Bkpitdl95I) that you can do some sort of fingerprinting with language, this is quite complex. In my opinion with a bit less effort, you could extrapolate eg. the customer segment of an article, if totally most topic would be in the group "car" it would be in the customer segment car...
Pseudo-Code: const int car = 1; const int gardening = 2; const int hacking = 3; const int populism = 4; ... String groups[1][] = { car, street, clutch, speed, acceleration, ... } String groups[2][] = { gardening, mowing machine, tree, chainsaw, flower, vegetables, ... } String groups[3][] = { CPU, hacking, GPU, Ram, Source, exploit, ... } String groups[4][] = { beer, meet, BBQ, terrorism, ISIS, jew, Nazi ... } => the stupid topics people talk about, finding common sense, feeling superior without ever written a line of code... String groups[n][] = ....
and (that's just an idea) then showing it in a nice graphical way like the wordpress plugin wordstats https://de.wordpress.org/plugins/word-stats/ the you could say a lot about a text or the person who wrote it, completely without complex linguistic analysis...
German: http://marclandolt.ch/ml_buzzernet/2015/11/22/einfache-wort-haufigkeitsanalyse/#sendung man könnte allenfalls z.B. den Populismus Quotient, der Nerd-Quotient oder z.B. den Hausfrauen-Quotient noch berechnen und anzeigen.
English: https://translate.google.com/translate?sl=de&tl=en&js=y&prev=_t&hl=de&ie=UTF-8&u=http%3A%2F%2Fmarclandolt.ch%2Fml_buzzernet%2F2015%2F11%2F22%2Feinfache-wort-haufigkeitsanalyse%2F%23sendung&edit-text= it would be probably easy to calculate something like the Populism quotient, Nerd quotient, house keeper quotient...
and of course I could do this by my own, since I already forked this repo, but I think it's just fair to post it in the repo of the creator...