MycroftAI / adapt

Adapt Intent Parser
Apache License 2.0
710 stars 154 forks source link

Regex entities with optional words #157

Open corus87 opened 2 years ago

corus87 commented 2 years ago

Not sure if this is a bug, a missing feature or that I just don't know how it works...

Is there a possibility to have an optional word what can or cannot be there, for example:

  1. search for moon
  2. search for the moon

both sentences should match only moon.

I can match both sentence with: engine.register_regex_entity("(for|about) (?P<search>.*)")

but with the second phrase, I will match the moon

I played around with (|the), (|the)?, (the)? and a lot of other regex patterns...I also looked into other mycroft-skills to find a hint, but no luck...

Isn't there some kind of wildcard for this case or a regex pattern I missed?

Thanks in advance.

forslund commented 2 years ago

Not sure if I've understood it correctly but it may be the way regexes work. They match as much as possible. (atleast from what I've gathered messing around on https://pythex.org/

A workaround may be to do something like: (for|about) ((the (?P<search>.+))|(?P<search2>.+)) But then you need to make your intent use .one_of('search', 'search2') so it's not terribly elegant. It feels like the intent should be expressible using regexes but I'm not enough of a guru to figure it out either :/

Edit: (literally 5 min later) (for|about) (the|)?(?P<search>.+$) should work as you expected (not sure what I did wrong yesterday) so something in adapt is causing this if it's not working for you. I'll do some more experimentation in adapt and see what I can see...

corus87 commented 2 years ago

Thanks @forslund for looking into my issue. Unfortunately it does not work on my end with register_regex_entity("(for|about) (|the)?(?P<search>.+$)")

search for the moon still matches the the article, "search": "the moon", but it should only match moon.

It is possible to do an or statement in the regex with ( for| a| an) but I can't believe there is no way to match just nothing. If no article matches, it should just match the whole rest of the sentence.

I played a bit around on regex101 and with this pattern (for|about)( (an|a|the)|\b) (?P<search>.*) it does match with both sentences only moon, but unfortunately not with adapt :(

I do have some workarounds in my project, like making a new intent for this case but it would be nice to have a cleaner solution.

forslund commented 2 years ago

Did some digging tonight, the issue is (I think) the iterative approach adapt uses with regexes... The sentence "tell me about the moon" will be tried in stages "tell" "tell me" "tell me about" "tell me about the" <- match found "tell me about the moon" <- match found

and adapt chooses to report back "the" as the match...

However using negative lookahead:

(for|about) (the|)?(?P<search>(?!the).+$)

should not match against "tell me about the", but should work with "tell me about the moon", however for some reason this doesn't result in any match...I see that "moon" is matched but somehow not used as a result. will need to dig some more...

forslund commented 2 years ago

"(for|about) (the |)?(?P<search>(?!the)\S+$)" Seems to work in my test case...there is an issue where an extra space is matched: " moon" causing a later match to fail.

corus87 commented 2 years ago

Indeed this works on my end too.

It also works with several words "(for|about) (the |a |an |)?(?P<search>(?!the|a|an).*)". But it does look a bit ugly though... Is it because of the design of adapt we have to set the optional words twice? Or is it because of regex works?

Something like (an |a |the |.* ) would be much cleaner, but I got the feeling its not possible because of regex...

With this solution I can hardcode something in my code to have an easier matching.

Anyway, thank you very much for your help!

forslund commented 2 years ago

Partially it's caused by how adapts runs multiple passes over subsets of the utterance but partially something seems slightly wrong (or I don't quite understand it). I will look into it some more and see if I can understand the internals better but I'm not sure I'll be able to improve things.