Allow Plugins to Specify Supported Languages

NeonDaniel commented 2 years ago

Is your feature request related to a problem? Please describe. Currently, there is no way to know if changing languages will be supported by a particular STT engine.

Describe the solution you'd like I propose adding a property to the base STT class (and TTS for consistency) that allows a plugin to specify what languages it supports as a set. A default empty set value can be checked for to provide backwards-compatibility.

Describe alternatives you've considered @forslund mentioned using config to handle supported languages and I think this could be a good method for some plugins, though my proposed solution leaves this entirely to the plugin author to specify how to determine supported languages. i.e. something like a Google or Amazon plugin might more easily parse an API return or web page, a local engine might look at a local directory of models, etc.

Additional context In Neon, where multiple users are supported, TTS language is considered per-request, so the current validation at init isn't a complete solution.

forslund commented 2 years ago

I think the way you suggested in the chat with it being defined by the plugin-author.

What I was referring to in chat was when a config sets a model reference the lack of langauge information can be worked around with setting the supported language through config as well for that TTS. So that's not really an issue.

krisgesling commented 2 years ago

Yeah seems like a nice approach to me.

When a plugin reports a language is "supported" it seems the assumption is that the language model is also loaded and usable, not just that the engine could be configured for [x, y, z] languages. Is that right?

Or would supported_languages include any languages where a model is available, but not necessarily loaded?

Is it worth clarifying this term?

active_languages
loaded_languages
?

NeonDaniel commented 2 years ago

I think the parameter should only describe languages that could be presently requested (so only downloaded models or models that can be automatically downloaded). Maybe supported is the wrong word there.. I think loaded_languages or valid_languages might make sense (though I'm not sure a model would be necessarily "loaded", it would probably be loaded on request)

NeonDaniel commented 2 years ago

Refactored to available_languages with an updated docstring to clarify that the list represents languages that the plugin can currently address in whatever the current installed state is.

Perhaps later, a supported_languages field would make sense for documentation, but I think from a code side we only should advertise what is available to the specific instance.

MycroftAI / mycroft-core

Allow Plugins to Specify Supported Languages #3058