ARMmaster17 / JeffBot

(Yet another) comical and extensible chat bot.
https:\\jeffchatbot.herokuapp.com
MIT License
9 stars 2 forks source link

Scaling n-gram algorithm #12

Closed ARMmaster17 closed 8 years ago

ARMmaster17 commented 8 years ago

Right now, JeffBot uses bigrams to do language processing. Should it be moved up to N3 at the cost of purging the database and higher training data count requirement.

ARMmaster17 commented 8 years ago

Going to change from a n2->n5, then scale down if no matches. Will terminate on missing n2 link.

ARMmaster17 commented 8 years ago

Due to a database issue, changed form N2 to N3 in latest commit. Verifying that there are no bugs before closing issue.

FreekingDean commented 8 years ago

I'm not 100% sure, but could this be related to #29?

ARMmaster17 commented 8 years ago

Yes, this is related to #29. Investigating.

FreekingDean commented 8 years ago

So I figured the issue it was in the formulator.rb file. It was using the old style bigram call to Wordchain

FreekingDean commented 8 years ago

JeffBot now is very specific in his responses. I think we should add some randomness into his next choice and not always choose the most popular word.

ARMmaster17 commented 8 years ago

You may be missing some code, I believe on Saturday (EST), I committed in code that would randomly pick a word to add based on the count field. (Higher count, higher probability of being picked).

FreekingDean commented 8 years ago

I had just started playing with this Sunday, so I should be up to date. I'll look deeper into it.

ARMmaster17 commented 8 years ago

See commit 8390e6a095c91130cf4855ea678379de2242a8a1

FreekingDean commented 8 years ago

I see! Totally glossed over that. I "Rubified" it a bit to make it a bit more clear and closer to ruby syntax in #31

FreekingDean commented 8 years ago

It uses a bit more of a popular random sort.

FreekingDean commented 8 years ago

So It will get the group of words and starting with the most popular choice its got a 90% chance of using the first one same for the second and so on. Meaning the first 2 options have a (.90+(.90*.90)) chance and so on for SUM(0-n)(.90x.90^n)

FreekingDean commented 8 years ago

I think you can close this if this is for N3 Ngram! :)