aichaos / rivescript-js

A RiveScript interpreter for JavaScript. RiveScript is a scripting language for chatterbots.
https://www.rivescript.com/
MIT License
377 stars 145 forks source link

Request for Length Wildcard #337

Open I-am-Orion opened 4 years ago

I-am-Orion commented 4 years ago

I want the length wildcard feature in Rivescript so that I get the option to match user's input text more accurately. e.g.- EXACT LENGTH WILDCARD:

+ hello *2

Matches Hello John Doe Does not match Hello John

VARIABLE LENGTH WILDCARD:

+ hello *~2
- That is crazy!

Matches Hello! Matches Hello John! Matches Hello John Doe Does not match Hello John George Doe

MIN-MAX WILDCARD:

+ hello *(2-4)

Matches Hello John Doe and Hello John Dorian Doe Does not match Hello John

dcsan commented 4 years ago

rivescript uses a subset of regex to make it easier for non technical people to write scripts. there has been some discussion of using full regex syntax but it's never been finalized.

most of discussion moved here https://github.com/aichaos/rivescript-wd/issues/6

but there are other related issues eg https://github.com/aichaos/rivescript-js/issues/253 https://github.com/aichaos/rivescript-js/issues?q=is%3Aissue+regex+is%3Aclosed

dcsan commented 4 years ago

https://github.com/aichaos/rivescript-js/pull/256

actually there's a PR here where you can use the ~ trigger syntax for individual triggers. this doesn't provide full regex features but perhaps you can extend it based on that PR

gleuch commented 4 years ago

You can run it through an object and parse accordingly to get string length. Basic example, YMMV.

> object checkHelloWildcard javascript
  var [rs, [str]] = arguments;
  return rs.reply(rs.currentUser(), `reply hello with ${str.length}`);
< object

+ hello *
- <call>checkHelloWildcard "<star1>"</call>

+ reply hello with 2
- I said hello with 2 characters!

+ reply hello with *
- I said hello with <star1> characters!

You can get more advanced by referencing and returning from topics to handle these cases.

kirsle commented 4 years ago

This has been asked for a few times and I haven't wanted to try and dig into what the regular expression for this would look like.

I know SuperScript.js (a fork of Rivescript) has syntax like described in the OP and I went sleuthing through their code, but didn't find a regexp that I could take from that and add to RiveScript.

RiveScript's "simplified regexp" system for triggers ends up creating some rather gnarly raw regular expressions to support all the features RiveScript has. For example the "[optionals]" syntax in RiveScript expands out to a regexp that looks like:

+ what is your [phone|office|home] number ('^what is your(?:(?:\s|\b)+phone(?:\s|\b)+|(?:\s|\b)+office(?:\s|\b)+|(?:\s|\b)+home(?:\s|\b)+|(?:\s|\b))number$')

A lot of things are going on with this example: it needs the regexp to either look like "what is your (phone|office|home) number" treating the optionals as a regular alternatives capture group, but also needs to match messages that contain none of those words, so needs the spaces on either side to be optional so you can just say "what is your number" but also require at least one space, so that "what is yournumber" does not match (lack of space where the optionals would go). Additionally, it needs to support the optional being at the beginning or the end of the trigger, and in these cases the extra space characters on either side need to be not required for matching. The word-boundary metacharacter \b helps it anchor on "word boundary (spaces) or start/end of string"... and this brings with it a new set of problems, namely, Unicode symbols have difficulty matching because \b only considers ASCII alphabet to be "word characters" and not umlauts or foreign language symbols.

All this to say... extending the regexp engine further to support a "number of words wildcard" while making it compatible with all the existing complexity that RiveScript's triggers currently supports may be a tricky task. If done incorrectly, RiveScript may assemble regular expressions that are invalid and have syntax errors, or that cause matching to fail in certain use cases, and introduce more bugs into the library than it solves.

If you want to take a stab at this and figure out a regexp and send me a pull request, feel free! I can then port that change to other editions of RiveScript, too (i.e. Python, Java and Go versions). But I personally haven't felt motivated enough to do this, and everyone who's asked me for this feature doesn't seem willing to try it themselves either.