CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 440 forks source link

Generated voice is unclear and noisy #475

Open mirfan899 opened 5 years ago

mirfan899 commented 5 years ago

I've trained the model for Cantonese, using (https://github.com/Jackiexiao/MTTS) frontend with modification for Cantonese(https://github.com/mirfan899/MTTS). Model is trained and wav files are generated. But audio is noisy and unclear. I've attached the logs for the reference. output.log and generated audio sample. ASR1.wav.zip

HiiamCong commented 5 years ago

Maybe you can try to increase the acoustic training epochs. What vocoder are you using? You can try to extract audio features and resynthesis by that vocoder to check vocoder quality on your audio file.

mirfan899 commented 5 years ago

It seems my question file is not good enough. I was looking for question file info but did not find anywhere on the internet. If you have any link related to question file format and details of the questions, share it.

RasmusD commented 5 years ago

does the frontend you've used not provide a question file? Else, you should figure out which features it generates in the label files and write a matching question file. Feel free toshare it back into this repository under https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/questions Perhaps the mandarin question file can provide a starting point for the cantonese?

mirfan899 commented 5 years ago

I've used https://github.com/Jackiexiao/MTTS for this purpose. I've generated the question file for Cantonese using Mandarin structure. After adding more data, generated voice has some words yet some words are not clear.

RasmusD commented 5 years ago

How much data are you using?

mirfan899 commented 5 years ago

Currently 220 audios.

RasmusD commented 5 years ago

That's probably the issue then. I'd recommend at least 1000 sentences. Preferably a lot more for high quality synthesis.