MycroftAI / mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.
https://mycroft.ai
Apache License 2.0
6.48k stars 1.27k forks source link

Audio pre-transcription parsing #3113

Open NeonDaniel opened 2 years ago

NeonDaniel commented 2 years ago

Is your feature request related to a problem? Please describe. It can be useful to modify audio passed to STT plugins to remove silence and normalize audio levels for better accuracy. There are also use cases for tagging audio that could be used in skills (speaker identification, mood detection, etc).

Describe the solution you'd like This is implemented in Neon and the plugin base class is defined in neon-transformers.

Describe alternatives you've considered N/A

Additional context This was discussed on the forum https://community.mycroft.ai/t/proposal-for-organizing-functionality-in-mycroft-core/11519/6

krisgesling commented 2 years ago

Hey, I've definitely talked to different people about some similar things, and I like how the concept of an "audio transformer" abstracts it away from where it comes in the pipeline.

I think that's one of the things we want to explore is how to enable projects to use elements like this in the ways that solve their particular needs, without necessarily needing to modify core itself. This could be a pre-STT, post-TTS, or used for any other purpose. It does a specific task, rather than necessarily being baked into one of these services. As an example, if you had a noise reduction audio transformer:

The ideal architecture would allow them all without needing to fork core, or the STT/TTS/other service they have selected.

NeonDaniel commented 2 years ago

This could be a pre-STT, post-TTS, or used for any other purpose

I hadn't thought of the post-TTS use case, but that would be very useful for cleaning up poor quality outputs (I'm thinking of the old MozillaTTS that would append sounds to text without punctuation), or to handle a user wanting their responses read back faster/slower. If the audio backend doesn't have to deal with those transformations, it also means they should work with any backend.