A GLaDOS TTS, using Forward Tacotron and HiFiGAN. Inference is fast and stable, even on the CPU. A low quality vocoder model is included for mobile use. Rudimentary TTS script included. Works perfectly on Linux, partially on Maybe someone smarter than me can make a GUI.
Currently, the TTS engine mispronounces and sometimes completely omits certain letters, particularly at the beginning and end of sentences.
Solution
I implemented a simple solution that solves the issue by adding padding characters at the start and end of sentences. placing commas (",,,") at the beginning and end of each sentence the user inputs can improve the pronunciation accuracy of the engine
Testing
I created a test case to evaluate the solution by making the tts engine say the words "crisp" and "crispy" to compare the outputs.
the files below are stored in my own fork of the repo:
old_crisp.wav: Audio file with the TTS engine's output before the padding solution. - crisp
old_crispy.wav: Audio file with the TTS engine's output before the padding solution. -crispy
new_crisp.wav: Audio file with the TTS engine's output after adding paddings. - crisp
new_crispy.wav: Audio file with the TTS engine's output after adding paddings. - crispy
I'm currently working on a new version of the model which fixes this issue without a workaround. It uses DeepPhonemizer as the phonemizer and has much more training data.
Problem
Currently, the TTS engine mispronounces and sometimes completely omits certain letters, particularly at the beginning and end of sentences.
Solution
I implemented a simple solution that solves the issue by adding padding characters at the start and end of sentences. placing commas (",,,") at the beginning and end of each sentence the user inputs can improve the pronunciation accuracy of the engine
Testing
I created a test case to evaluate the solution by making the tts engine say the words "crisp" and "crispy" to compare the outputs. the files below are stored in my own fork of the repo:
old_crisp.wav
: Audio file with the TTS engine's output before the padding solution. - crispold_crispy.wav
: Audio file with the TTS engine's output before the padding solution. -crispynew_crisp.wav
: Audio file with the TTS engine's output after adding paddings. - crispnew_crispy.wav
: Audio file with the TTS engine's output after adding paddings. - crispy