butterscotchstallion / guacbot

Asynchronous IRC bot written in Javascript
4 stars 3 forks source link

Wordcount for realsies #95

Open butterscotchstallion opened 10 years ago

butterscotchstallion commented 10 years ago

How to find most commonly used words by user inclusive of every line

New table

logs_meta_words
id
word
occurrences 
nick
channel

Index: unique index on word, nick, channel - on duplicate key update occurrences = occurrences + 1

When a new message is logged, it is parsed the same way and added to the logs table. This way the backlog isn't constantly growing.

peed commented 10 years ago

This is beautiful. The only thing is maybe we dont have to limit it to three characters, let everything be a word if its surrounded by spaces.

butterscotchstallion commented 10 years ago

Got any other cool ideas for this? I was thinking we could store average words per line as well (this calculation would consider all words, regardless of length)