MycroftAI / adapt

Adapt Intent Parser
Apache License 2.0
709 stars 155 forks source link

Issues with previously registered vocab terms and regex #41

Closed ethanaward closed 8 years ago

ethanaward commented 8 years ago

@HeinzSchmidt was talking earlier about having troubles with properly triggering his shopping skill, so I started testing it. He registered regex that looks like (?P<ShopVerb>.*) (?P<ShopItem>.*) to (?P<ShopLocation>.*), and was trying to use it to match to add onions to my shopping list, with add as the ShopVerb, onions as the ShopItem, and my shopping list as the ShopLocation. However, he was getting back my shopping as the location rather than my shopping list. According to an online regex tester, this is incorrect.

I did some testing, and found out that the problem persists with other syntax - add onions to my shopping mouse, for example, returns my shopping as the location, chopping off mouse at the end. Other similar words such as terminal were also getting cut off, and I realized that mouse and keyboard were registered as vocab by the desktop launcher skill. When I added desktop launcher to the blacklisted skills, the problem with those two words stopped. I did a quick check, and list has been registered as vocab to scheduled skills. I added those to the blacklist, and I started getting my shopping list back again.

This means that there is some issue with Adapt parsing regex when there is another vocab term in the expression. From my testing, it also seems to be localized to be only at the end of sentences, although that may not be true.

clusterfudge commented 8 years ago

This is behavior precisely as expected. Adapt's strength lies in it's abilities as a KET (Known Entity Tagger). Regexes are present to represent information that cannot be known ahead of time (for example, the contents of a text message or web search query). Adapt attempts to compensate for the egregious nature of wildcard regexes (the 3 .*'s) by weighting known entities higher in parse results. The above will always be treated as "lesser" than most parse results containing known entities.

The proper way to model the above would be with a list of known (and probably static) list of ShowVerbs, a static or dynamic list of ShopLocations (is this all lists? can we come up with a better name?), and then a single wildcard regex to capture the items being added. perhaps something like (?P<ShopItem>.*) (to|onto|in), for the english regex. (I think that's valid syntax, don't quote me on it).

TL/DR; Modeling language with regular expressions is asking for disappointment. Use Adapt! It's better ™

ethanaward commented 8 years ago

Alright, gotcha. Thanks for the clarification.

And yeah, I realized the issues inherent in using that kind of regex, and was going to suggest that he take out at least the shop verb into a list of vocab, if not the location. I just didn't realize that was actually reflected in Adapt.