Closed malradhi closed 5 years ago
Actually ,merlin will train with DeepRecurrentNetwork by default from code ./src/models/deep_rnn.oy
.
Thanks for your answer. But if you run the VC based RNN (LSTM, BLSTM, or RNN) as default with WORLD vocoder, the quality of synthesized samples are not good at all.
However, if you use Merlin based RNN without VC, then the samples are pretty good with WORLD. Is the problem with VC? I mean WORLD vocoder is not good in VC based RNN, or there is something error inside Merlin/VC.
I cannot offer your more information ,i'm just reading the source code of merlin. Your opinion is right,some end2end architecture like tacotron use attention based RNN as its components and gets nice work.By the way what does VC short for?
VC stands for Voice Conversion. It is another application inside Merlin as it works well with feedforward deep neural network based WORLD, but not with RNN. Anyway, thanks for commenting here :)
Voice conversion is a different task to speech synthesis. Converting to a target speaker (i.e. VC w/RNN) involves different challenges to training on the targets speaker (i.e. SPSS w/RNN). It is unsurprising if the output of these systems does not sound the same.
Merlin with RNN is not working? I mean that the quality of synthesized speech is not good. Have you tried the new version of Merlin? Any advise? Thanks