Voight Kampff should support testing multiple languages

codyreinold commented 4 years ago

Feature Description

While English is the only language currently officially supported by Mycroft, the codebase is mindful of localization and presumably support will continue to grow in the future.

In the creation of the Voight Kampff testing system, the main example given for setting context is 'Given an English speaking user'. However, upon reviewing the code, it looks as if this condition is hard-coded, and alternative languages are not currently possible (eg: a Spanish speaking user would cause the test to fail).

Given that developers (both in-house and community) seem to take advantage of localization / different languages often in skills, I would recommend localization conditions be implemented in Voight Kampff ASAP, to enable and encourage developers to test all aspects of their skills.

While official support may not come for a while, encouraging a strong base of well tested, highly localized skills should be a priority. This ensures that, upon adding official support for additional languages in the future, numerous pre-tested skills will be available.

Recommended Solution

There are two main items needed to support this:

Develop a list of languages likely to be supported in the future
- Standardize list of Voight Kampff supported languages (can be basic to start, as may be updated in future)
- Optional - Store in a commonly accessible location / documentation, as it may be used in future localization efforts (can provide examples of how it may be used in comments, if helpful)
- Note: Language support in testing suite may be different than for core functionality; this is expected. Typically you will want testing language support released first, with core following. As such, I do not think core continuing to support only en-us is an issue or inconsistent.
Modify Voight Kampff to accept a dynamic user speaking languages in each test

codyreinold commented 4 years ago

My recommendation re: standardized list would be to mirror the list found in mycroft-core/mycroft/res/text

ca-es
da-dk
de-de
en-us
es-es
fr-fr
hu-hu
it-it
nl-nl
pt-pt
ru-ru
sv-fi
sv-se
tr-tr

Does anyone have a recommendation re: storage location of this list? We need the list to do mapping of 'english' >> 'en-us', etc. We could store inline, but may make sense to store in a central location.

forslund commented 4 years ago

Hi @codyreinold indeed the idea is that eventually the voight kampff test should handle multiple languages on the fly. It is planned for stage two where voight kampff can take a more active control over Mycroft. (currently needed to change a basic setting such as the language)

codyreinold commented 4 years ago

I saw that noted in another issue. However, given that the "Given an English speaking user" example is often used in documentation, I thought it'd make sense to implement sooner rather than later.

I have yet to test this, so could be wrong, but I am thinking a solution like this should be both backwards compatible and able to be implemented immediately.

forslund commented 4 years ago

The main issue is Mycroft core doesn't support reloading language files on the fly (I guess there are ways around this but needs to be carefully considered) so doing an implementation at this point is in my opinion premature.

forslund commented 4 years ago

Technically it should be possible to unload all skills push the language setting to the config, trigger unload followed by a reload of all skills. It will be rather slow changing language.

Another approach could be to drop this given, and use the tags to tag test cases with different languages. That would group the languages and we could do a concentrated run on each language.

codyreinold commented 4 years ago

Thanks for the info--I may end up diving deeper into it, but will heed your thoughts re: it being too premature.

To clarify, this is mainly an issue with core loading up respective language files for skills on boot (or refresh), requiring a reload upon any change, correct? Other config params (eg: geolocation of user / device) would not introduce the same challenges, correct?

forslund commented 4 years ago

There's currently some thoughts going on for updating the resource loading to make it more flexible, our aim is to have it in place for the 20.08 update.

If you have ideas or suggestions they're very welcome.

Yeah basically currently dialog files and intent related files are loaded at skill startup and then are left alone. So if a skill is reloaded after the language config is updated it would gain the new language.

codyreinold commented 4 years ago

Let me think on it a bit. Do you happen to know the relevant parts of the codebase offhand? If not, no worries.

forslund commented 4 years ago

The part of the skill loading that sets the skill language, loads dialogs and registers intents are located here: https://github.com/MycroftAI/mycroft-core/blob/dev/mycroft/skills/skill_loader.py#L238

There you'll see most of the relevant MycroftSkill methods for loading intents and dialogs.

I think the tricky part is the handling registering the intents. Dialogs could probably be changed to do some lazy loading when needed without much trouble.

codyreinold commented 4 years ago

Spent a bit of time thinking on this and digging into the code to understand loading procedures, intent processing, etc. I have NOT reviewed the entire codebase in detail, so it is entirely possible I am missing elements / complexity.

Given the current architecture of the application, I saw a few different paths to solving this problem. I have a strong preference towards not just solving for this testing problem, but also architecting for growth.

Ability to Load Numerous Languages

Background / Assumptions

Mycroft currently only loads the vocab / intents for a single language upon boot. I imagine this decision was made to minimize both lookup time (storing all intent values in memory) and memory usage (only store required language). These principles are essential to maintain in future releases, given the number of users running on low-grade hardware, but also pose some significant problems re: application flexibility.

Solution

Under this solution, the skill / language paradigm would shift from a single language model to allowing multiple languages. Existing functionality is maintained, allowing users to use only one language; single language users will see no change / impact on performance or memory usage.

New functionality will be added to support [1 - ∞] languages. Languages would be passed via list in order of preference, with strings being allowed in the case of single languages or an all keyword.

Implementation / Changes

Accept an ordered list of language codes (ordered according to preference)
- Default to (list) ['en-us'] and accept (str) 'en-us' and (str) 'all', to ensure backwards compatibility / non-breaking and allow for loading all language intents to be loaded in a higher memory environments

Upon intent dictionary construction, use the following design (example given an input of ['en-us', 'es-es'])

(OrderedDict) {
'en-us': (dict) {
    'skill-1': {
        [EXISTING_SKILL_STRUCTURE]
    },
    'skill-2': {
        [EXISTING_SKILL_STRUCTURE]
    }
},
'es-es': (dict) {
    'skill-1': {
        [EXISTING_SKILL_STRUCTURE]
    },
    'skill-2': {
        [EXISTING_SKILL_STRUCTURE]
    }
}
}

OrderedDict would allow for looping in language preference order, if more than one language is provided (break upon match / success confidence threshold)
Within converse functionality (of active skill), store the identified / active language
When calling intent processor, allow for an optional lang arg, which would allow for skipping the OrderedDict loop and targeting a specific element (lang) group
Existing codebase would be used to loop skills / intents, as structure is identical; no changes needed once inside a given lang element

Rationale

I like this solution as it solves immediate problems while also moving the codebase towards greater flexibility and functionality.

Immediately, it solves the testing problem discussed in this thread (load all languages, then route tests to test against the appropriate / given lang element using the optional lang arg). Memory usage will be higher during tests given the increased dictionary size, but I don't foresee this being a major issue.

In general, this flexibility and robustness will encourage better testing by skill developers. Additionally, it moves mycroft-core significantly closer to non-en-us language support.

Longer term, it opens the door to a number of different growth opportunities. First, it allows for multi-lingual households / users to use the system more naturally.

Second, it begins to setup the system for user-specific profiles. I am a big advocate for user identification via voice-fingerprinting (allowing the system to do different tasks different ways for different users).

Eg: "turn off my bedside light" >> my = current_user >> current_user_voice = johnsmith >> "[johnsmith.lights.bedside.off]")

That being said, even language-specific wake words could be an option in the future, to trigger a language-specific experience.

Eg: User could configure "hey mycroft" >> en-us and "hola mycroft" >> es-es)

Finally, the complexity of this solution seems reasonable. Of course it presents some complexity given it is touching critical aspects of the codebase (ie: intent processor interface), but relative to other solutions, it does't seem overly complex. Additionally, it doesn't appear to have too many downstream impacts. With proper default cases in place, implementation changes can be isolated, well tested against existing functionality, and ensure full backwards compatibility.

Drawbacks

Only two potential drawbacks come to mind, but I wanted to mention them. First, as is true of all solutions to this problem, we'll pay a bit of a complexity penalty. The codebase will become a bit more complex, but I don't think this should be an inhibiting factor.

Second, I can envision a very rare case where an intent from one skill, in one language, mirrors that of a different skill, in a different language. This is a very rare case and would only impact if a user had both languages on and with priority order set so the incorrect command got called first. Given the rarity of this occurrence, I don't think this would be an inhibiting factor, either.

ChanceNCounter commented 4 years ago

I'm working on Lingua Franca's part of this problem here (other contributors' approval not yet obtained.) That PR adds OO interfaces to all the parsers and formatters, the idea being to load only those languages you want. I'm also looking at a clean way to have the module itself do that, but I'm personally a big fan of objects.

Also a list of supported languages.

Many of the formatters and parsers have been localized, and ofc the repo always wants more love.

Anyway, I'm tooting my horn about that refactor all over the place, but it hits on this problem. If it's accepted, I was gonna follow up with a PR here at Mycroft Core, which would do pretty much what @codyreinold just described:

New functionality will be added to support [1 - ∞] languages. Languages would be passed via list in order of preference, with strings being allowed in the case of single languages or an all keyword

right down to order of preference, that's exactly how I did it. Well, I only worried about the first language being the default/active language, but, yeah.

codyreinold commented 4 years ago

Took a quick look at the PR and related issue--definitely a lot of great work. I'm still new to the project, so great to get a quick into to lingua franca and see a bit re: how it fits in and what you've been up to.

I think the biggest difference is the overall paradigm shift from having an explicitly defined default language to accepting any number of ranked languages. In my mind, there is no concept of "default / active language". Within a given conversation, sure, there is an active language, but outside of that, its always parsed via ranked order. I suppose setting default = first lang is comparable to this in some respects, but I'd argue that architecting solely for looped ranked lists + breaking after success reduces complexity and provides for better future scalability.

I consider language more of a user preference (in the case of user-specific profiles). A single ranked list as a system-wide preference works right now, because the system is built around the assumption that one device = one user. As that paradigm shifts in the future, to one person = one user (multiple users per device), I expect this will be transitioned even further to an individual-specific ranked list, with a global aggregate of profiles defining while libraries to load.

forslund commented 4 years ago

It looks like a good solution and matches some of the work I'm doing / planning on refactoring the intent service (for clarity, and extending the converse support to multi client setups).

I'll try to hurry that part up and maybe we can have that as a base for developing this further after the weekend? (Basically breaking out the adapt part from the "IntentService" to a stand-alone, making precise not be a fallback skill, add ranges to the fallback execution, explicitly call the fallback-unknown on complete failure instead of having it a fallback).

The suggestion you make to have the stt service return a language code from the wake word system or it could be returned along with the STT data (I believe there are for example Kaldi models that detect language) gives a lot of options.

You're right that it will be more complex but as you say the added functionality out-weights the added complexity (Not sure it will be all that much added complexity in the end, if done properly a single lookup of the language order to be used and a loop.)

codyreinold commented 4 years ago

Glad you agree! I'm still new to the project--working to get up to speed, but its a fairly large codebase with quite a few dependancies (additionally, my primary focus nowadays is more on application architecture; just helping out here for fun / my own interests). Happy to hear I wasn't too far off base.

Let me know when your refactor is decent shape and I'll take a look. No need to hurry, though--plenty of other stuff to tinker with.

ChanceNCounter commented 4 years ago

To be clear, the "default" language in a Lingua Franca context just means "the language you get if you call a function without specifying a language." The ability to call a function in a specific language is already present. What that PR changes is that LF would no longer be loading all languages into active memory, unless that's what you want from it.

Having them in a particular order is just a side effect.

forslund commented 1 month ago

Closing Issue since we're archiving the repo

MycroftAI / mycroft-core