Sleepwalking / libllsm2

Low Level Speech Model (version 2.1) for high quality speech analysis-synthesis
GNU General Public License v3.0
146 stars 17 forks source link

How to do speech synthesis? #5

Closed attitudechunfeng closed 5 years ago

attitudechunfeng commented 5 years ago

Can you talk more detailly about how to do speech synthesis. And how about the quality compared with mainstream vocoder such as world? @Sleepwalking

Sleepwalking commented 5 years ago

Hello,

Sorry for the late response. In the context of speech synthesis libllsm is designed primarily for use in a concatenative (or hybrid) system rather than 100% statistical parametric synthesis. For a quick comparison, libllsm without temporal noise shape parameters (llsm_nmframe.eenv), harmonic phase (LLSM_FRAME_VSPHSE) and source-filter decomposition is essentially the same as WORLD and in that regard I don't expect a significant quality difference. However with each of these feature enabled you gain a new degree of control over the voice, although this also arguably makes it harder to model. This does not cause a problem for concatenative synthesis and you'll know what to do by learning from the test cases. For statistical parametric synthesis, how to incorporate phase and source parameters is still an area under study. I have not personally attempted this but some works from Shinnosuke Takamichi and Gilles Degottex over the past decade may offer some clues.