MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.
https://mycroft.ai
Apache License 2.0
6.48k stars 1.27k forks source link

Text pre-intent parsing #3112

Open NeonDaniel opened 2 years ago

NeonDaniel commented 2 years ago

Is your feature request related to a problem? Please describe. Adding a plugin-based method for manipulating transcriptions from STT before passing them to the intent service would allow for co-reference resolution, number normalization, expanding contractions, translation, and any other parsing to help intent engines.

Describe the solution you'd like This is implemented in Neon and the plugin base class is defined in neon-transformers. I think the simplest implementation is the one in Neon.

Describe alternatives you've considered It might be more logical to have the parser service handle recognizer_loop:utterance and emit the result to the intent service (mycroft.utterance, mycroft.parsed_utterance?). This would allow for Messages to bypass text parsing if there was a reason to go straight to the intent service.

Additional context Potential partial solution to https://github.com/MycroftAI/mycroft-core/issues/1221 This was discussed briefly in the forum https://community.mycroft.ai/t/proposal-for-organizing-functionality-in-mycroft-core/11519/7

JarbasAl commented 2 years ago

unrelated to https://github.com/MycroftAI/mycroft-core/issues/1221 , thats just google STT blocking words because of the implementation on selene side. with the chromium plugin it can be disabled so that code could be ported to selene if desired

this suggestion could be used for the opposite! censor curse words in clean text, but its hard to go from **** to the source text

NeonDaniel commented 2 years ago

unrelated to #1221 , thats just google STT blocking words because of the implementation on selene side. with the chromium plugin it can be disabled so that code could be ported to selene if desired

this suggestion could be used for the opposite! censor curse words in clean text, but its hard to go from **** to the source text

Should have elaborated, I meant it could be disabled in Selene and censoring implemented as a plugin. I assumed the rationale for filtering in Selene is to prevent Mycroft from transcribing curse words