MycroftAI / mimic2

Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
Apache License 2.0
581 stars 103 forks source link

some different suggestions regarding the engine #56

Open king-dahmanus opened 2 years ago

king-dahmanus commented 2 years ago

Hello developers. I'm not a dev, but I am suggesting some improvements and features/ideas for this engine. First with the shorter one, to improve the quality of the voice, you need to change the encoder. From what I've hird of the samples, this engine is using griffinlim encoder which sounds robotic. You need to change it to use something like hifigam or any other better encoder. Hifi gan sounds promising. For the second one, I suggest making this engine available for windows assistive technologies by making a sapi5(speech application programming interface) distribution of the engine so screen readers like NVDA or jaws, text rraders like balabolka or textaloud, and many other programs can use it. The voice has to be optimized for responsiveness, meaning faster than realtime output and no lag or delay before or in the middle of the speech. Hope you consider my suggestions. Thanks, and hope we can discuss this.