aichaos / rivescript-js

A RiveScript interpreter for JavaScript. RiveScript is a scripting language for chatterbots.
https://www.rivescript.com/
MIT License
377 stars 145 forks source link

Feature Request Parse POS data along with Plain text #6

Closed silentrob closed 9 years ago

silentrob commented 10 years ago

This might seem pretty advance, but I think it could really open the engine up to some cool concepts.

Given:

I know you can sorta do this will arrays, but this approach would be far more powerful and involve less code (rules).

I'm still exploring this and I got the engine able to parse both full POS tags after plain text, but I have not gotten them both working together.

The other gotcha here is the formatting of msg changes the inflection of the POS data. It would be ideal to keep both a msg and raw message.

kirsle commented 10 years ago

I've never heard of POS (the only acronym I know by that name is "Point of Sale") but I'm assuming it's similar to NLTK? And that "NNP" means "proper noun"?

In your example you could use arrays of proper nouns and use them there, as you mentioned. I assume POS goes far beyond that in complexity, though.

Somebody once asked about integrating NLTK support with the Python version of RiveScript, and I don't see how it could be implemented cleanly without entirely changing RiveScript. It would probably be easier to create an entirely new bot engine that uses these technologies than to try to make RiveScript work with them.

silentrob commented 10 years ago

Sorry for not being more clear. Yes POS = Parts of Speech. I wrote a lib (2 years ago) https://github.com/NaturalNode/node-nltools that tags words with their appropriate part. http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

I'm actually using this to split the incoming message into separate sentences and handle multiple sentences at once on the reply.

So "My name is Bill. What is your name?"
Replies with "Nice to meet you Bill. My name is < bot name>."

I also use this approach to pre-evaluate if the sentence is an expression first, then tagging and passing it into rivescript.

var numCount = 0;
// If the tags have 2 symbols and two 2 CD, evaluate it.
var mathTerms = ["add", "plus", "+", "-", "minus", "times", "divide", "subtract", "multiply"];
for (var i = 0; i < posObj.tags.length; i++) {
if (posObj.tags[i].pos == 'CD' || posObj.tags[i].pos == 'SYM' || 
    mathTerms.indexOf(posObj.tags[i].token) !== -1) {
    numCount++;
}
}
if (numCount >= 3) {
input = "eval_expression " + input;
}

var reply = bot.reply(socket.name, input, bot);

Perhaps I will stick it in a branch.

kirsle commented 10 years ago

On the topic of sentence splitters, that's a feature I deliberately left out of the core RiveScript module. The most obvious competitor to RiveScript bots are Alicebots, and they have built-in sentence splitting, but I decided to take a more "Unix-like" approach with RiveScript: that it does one thing really well, instead of doing too much stuff that should be left up to the bots that use RiveScript instead. For example, if you just wanted to use the AIML parsing and replying system in your own code that isn't for an Alicebot, it's typically very difficult to separate the AIML part out from the rest of an existing Alicebot. i.e., the implementations of AIML libraries are also for entire AIML bots, with configuration XML files and a whole bunch of bloat that you don't necessarily want.

Sentence splitting in particular was left out because it's really easy to do it yourself, and RiveScript doesn't need the responsibility of deciding what a sentence is or how it should be split up. Perl example:

my @sentences = split(/[.;!?]/, $message);
my @replies;
foreach my $sentence (@sentences) {
   push @replies, $rs->reply($user, $sentence);
}
my $reply = join(" ", @replies);
silentrob commented 10 years ago

Yes, I completely agree that is should be left out, and the unix philosophy :) I just needed to add it to get though the Loebner screener Questions :)

silentrob commented 10 years ago

On the note about formatting the message. It would be nice to pass in additional processing. I would like to pass it though a spell checker before doing the substitutions.

Actually I could do that before sending the message into the system too.

kirsle commented 10 years ago

Some spell checking can be done with substitutions within RiveScript. But yeah that's the idea, the extra stuff outside the scope of "load replies and fetch one" is left to the bot's programming to implement.