CSTR-Edinburgh / magphase

MagPhase Vocoder: Speech analysis/synthesis system for TTS and related applications.
Apache License 2.0
78 stars 31 forks source link

Constant-rate features vs variable-rate labels #13

Open cveaux opened 6 years ago

cveaux commented 6 years ago

Hi Felipe,

I've being comparing the quality of Magphase copy-synthesis (low-dim) on a female voice depending on the value of b_const_rate. (I used mag_dim=60 and phase_dim=45). It seemed to me that the constant rate version has a little more buzziness but I somehow found it preferable to the pitch-synchronous version which is kind of noisy sometimes. (It looks as if the interpolation was filtering the noise albeit a bit too much). Does this match your observations?

Also, I have a question regarding the training with Merlin. Is it preferable to use the constant-rate version (b_const_rate = 1) or to use pitch-synchronous features and warp the labels (b_conv_labs_rate = 1)?

Thanks!

ZhaoZeqing commented 5 years ago

Hi Veaux,

I have a question about the frame rate. What does constant or variable frame rate mean? I'm not sure the following which is what I understand is right.

This is a phone and it's duration: 2000000 2500000 phone

If it is constant frame rate, each state's duration is equal, like this: 2000000 2100000 state 2100000 2200000 state 2200000 2300000 state 2300000 2400000 state 2400000 2500000 state

If it is variable frame rate, each state's duration may be not equal, like this: 2000000 2150000 state 2150000 2200000 state 2200000 2360000 state 2360000 2435000 state 2435000 2500000 state

Thanks!