Open GunpowderGuy opened 7 years ago
A contrarian view:
I humbly think that just improving the audio quality, which Baidu report in their deep voice 2 article, would be more useful than adding multiple speakers. The voice quality reported on their multi-speaker architecture is very low (around 2.5 out of 5, aggregated by Amazon Turkers), whereas the improvement they report over the original Tacotron voice quality is substantial!
You may observe that beginning from Table 1
in https://arxiv.org/pdf/1705.08947.pdf
In their article, they describe the modification to the original Tacotron, which reportedly makes the difference.
https://voice.mozilla.org/