MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.
https://mycroft.ai
Apache License 2.0
6.48k stars 1.27k forks source link

Regex entities are not localized #108

Closed clusterfudge closed 8 years ago

clusterfudge commented 8 years ago

There are currently several skills that use regular expression entities hardcoded into the skills themselves. These should be revised to use a localization system matching that in place for vocab and dialog.

ethanaward commented 8 years ago

How would you recommend we do this?

the7erm commented 8 years ago

How would you recommend we do this?

This one is going to be a challenge. The problem isn't just that there's a regex (that could be put in file similar to .dialog and .voc ) but the code to process the regex patterns maybe different.

This will handle 100% of the situations where there's a 1 to 1 match.

Sample mycroft/skills/<skill_name>/regex/<lang>/LocationRegex.rx
at (?P<Location>.*)
in (?P<Location>.*)

MycroftSkill.load_data_files(self) would then mycroft.skill.core.load_regex() like it does with mycroft.skill.core.load_vocabulary()

For situations where the match isn't 1 to 1 we'll probably have to build a class.

# location mycroft/skills/<skill_name>/regex_processors/<lang>/__init__.py
# Example WeatherSkillEnUsRegex
class SkillNameLangRegex(object):
    def __init__(self, skill):
        self.skill = skill
        # code to register rx.
        self.skill.register_regex("at (?P<Location>.*)")
        self.skill.register_regex("in (?P<Location>.*)")
        self.skill.__build_current_intent()
        self.skill.__build_next_hour_intent()
        self.skill.__build_next_day_intent()

def create_skill_rx(skill):
    return SkillNameLangRegex(skill)

Keep in mind this is all just one way to approach the situation. It's just an idea and here to get a conversation started.

ryanleesipes commented 8 years ago

@ethanaward please review.

ethanaward commented 8 years ago

@the7erm That localization does seem like a good implementation. What do you mean by situations where the match isn't 1 to 1?

the7erm commented 8 years ago

It all comes down to sentence structure. In one language what you're trying to capture will just be a continuous string. In another language you'll need to capture the beginning of a sentence and the end. Considering this scenario the code to process the match of a regex for english speakers may be completely different compared to another language.

Keep in mind I'm not linguist. I'm only guessing here. I have no idea about the structure of other languages, I just vaguely recall hearing sentences in other languages aren't always [noun] -> [verb] -> [noun], but could very well be [noun] -> [noun] -> [verb] ... duck duck go, duck go duck ... go duck duck. I should try and sleep some time.

ethanaward commented 8 years ago

222 in place to solve this