Open mirfan899 opened 5 years ago
Maybe you can try to increase the acoustic training epochs. What vocoder are you using? You can try to extract audio features and resynthesis by that vocoder to check vocoder quality on your audio file.
It seems my question file is not good enough. I was looking for question file info but did not find anywhere on the internet. If you have any link related to question file format and details of the questions, share it.
does the frontend you've used not provide a question file? Else, you should figure out which features it generates in the label files and write a matching question file. Feel free toshare it back into this repository under https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/questions Perhaps the mandarin question file can provide a starting point for the cantonese?
I've used https://github.com/Jackiexiao/MTTS for this purpose. I've generated the question file for Cantonese using Mandarin structure. After adding more data, generated voice has some words yet some words are not clear.
How much data are you using?
Currently 220 audios.
That's probably the issue then. I'd recommend at least 1000 sentences. Preferably a lot more for high quality synthesis.
I've trained the model for Cantonese, using (https://github.com/Jackiexiao/MTTS) frontend with modification for Cantonese(https://github.com/mirfan899/MTTS). Model is trained and
wav
files are generated. But audio is noisy and unclear. I've attached the logs for the reference. output.log and generated audio sample. ASR1.wav.zip