CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

Merlin vs RNN #415

Closed malradhi closed 5 years ago

malradhi commented 5 years ago

Merlin with RNN is not working? I mean that the quality of synthesized speech is not good. Have you tried the new version of Merlin? Any advise? Thanks

shartoo commented 5 years ago

Actually ,merlin will train with DeepRecurrentNetwork by default from code ./src/models/deep_rnn.oy.

malradhi commented 5 years ago

Thanks for your answer. But if you run the VC based RNN (LSTM, BLSTM, or RNN) as default with WORLD vocoder, the quality of synthesized samples are not good at all.

However, if you use Merlin based RNN without VC, then the samples are pretty good with WORLD. Is the problem with VC? I mean WORLD vocoder is not good in VC based RNN, or there is something error inside Merlin/VC.

shartoo commented 5 years ago

I cannot offer your more information ,i'm just reading the source code of merlin. Your opinion is right,some end2end architecture like tacotron use attention based RNN as its components and gets nice work.By the way what does VC short for?

malradhi commented 5 years ago

VC stands for Voice Conversion. It is another application inside Merlin as it works well with feedforward deep neural network based WORLD, but not with RNN. Anyway, thanks for commenting here :)

ZackHodari commented 5 years ago

Voice conversion is a different task to speech synthesis. Converting to a target speaker (i.e. VC w/RNN) involves different challenges to training on the targets speaker (i.e. SPSS w/RNN). It is unsurprising if the output of these systems does not sound the same.