Closed attitudechunfeng closed 5 years ago
Hello,
Sorry for the late response. In the context of speech synthesis libllsm is designed primarily for use in a concatenative (or hybrid) system rather than 100% statistical parametric synthesis. For a quick comparison, libllsm without temporal noise shape parameters (llsm_nmframe.eenv
), harmonic phase (LLSM_FRAME_VSPHSE
) and source-filter decomposition is essentially the same as WORLD and in that regard I don't expect a significant quality difference.
However with each of these feature enabled you gain a new degree of control over the voice, although this also arguably makes it harder to model. This does not cause a problem for concatenative synthesis and you'll know what to do by learning from the test cases. For statistical parametric synthesis, how to incorporate phase and source parameters is still an area under study. I have not personally attempted this but some works from Shinnosuke Takamichi and Gilles Degottex over the past decade may offer some clues.
Can you talk more detailly about how to do speech synthesis. And how about the quality compared with mainstream vocoder such as world? @Sleepwalking