R2D2FISH / glados-tts

A GLaDOS TTS, using Forward Tacotron and HiFiGAN. Inference is fast and stable, even on the CPU. A low quality vocoder model is included for mobile use. Rudimentary TTS script included. Works perfectly on Linux, partially on Maybe someone smarter than me can make a GUI.
MIT License
165 stars 87 forks source link

Fix Pronunciation Issue #10

Closed Hexanol777 closed 1 year ago

Hexanol777 commented 1 year ago

Problem

Currently, the TTS engine mispronounces and sometimes completely omits certain letters, particularly at the beginning and end of sentences.

Solution

I implemented a simple solution that solves the issue by adding padding characters at the start and end of sentences. placing commas (",,,") at the beginning and end of each sentence the user inputs can improve the pronunciation accuracy of the engine

Testing

I created a test case to evaluate the solution by making the tts engine say the words "crisp" and "crispy" to compare the outputs. the files below are stored in my own fork of the repo:

R2D2FISH commented 1 year ago

I'm currently working on a new version of the model which fixes this issue without a workaround. It uses DeepPhonemizer as the phonemizer and has much more training data.