Wolrd vocoder frame rate

CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.

http://www.cstr.ed.ac.uk/projects/merlin/

Apache License 2.0

1.31k stars 440 forks source link

Wolrd vocoder frame rate #139

Open theabc123 opened 7 years ago

theabc123 commented 7 years ago

I changed the World vocoder framerate to work with 4msec frame intervals and the resynthesized audio files are in the good speed. I copied the acoustic features in the models data directories and retrained the models, but the results is not correct, the speed of the speech is very high. I tried using frameshift : 4 parameter in the config file but it doesn't work. Any help would be appreciated. Thanks!

r9y9 commented 7 years ago

See https://github.com/CSTR-Edinburgh/merlin/blob/a5c0cd9baef50447188b59ffeda9f374678144e6/tools/WORLD/test/analysis.cpp#L311. It seems that frame period is hard-coded (though it should be easy to be configuarable).

theabc123 commented 7 years ago

@r9y9 Thank you for your answer! I just updated my issue please see it.

r9y9 commented 7 years ago

I think there are a few places that need to be changed, e.g. https://github.com/CSTR-Edinburgh/merlin/blob/b74abe4b54a1c34f6c8cdf4464b159b867affa50/tools/WORLD/test/synth.cpp#L293. I haven't looked into config files yet, though.

theabc123 commented 7 years ago

I changed the framerate in the two files analysis.cpp et synth.cpp and recompiled. After that I used the copy_synthesis.sh to generate acoustic features and regenerate my wav files successfully (the number of generated frames is good also). I think the problem is not the vocoder, but a parameter given to the models. I am trying to look at src/configuration/configuration.py line 363 I changed frameshift parameter in my acoustic config file with no success.

ronanki commented 7 years ago

If you change the frameshift for acoustic features, you need to change the same for linguistic features as well. check label normalisation script.

theabc123 commented 7 years ago

Hi ronanki, I changed the frame rate in:

label_normalisation (line: 209, 217, 351, 474, 510, 996 and 1000)
parameter_generation (188 and 189)
silence_remover (182 and 187) Also configuration/configuration.py line 363. The result is better now but still fast and some phonemes are not pronounced. I am using phone align, but is there a relation between the number of states and the 5msec framerate? Can you please tell me if the 0.5 factor in frontend/feature_normalisation_base.py line 163 has any relation with the 5msec framerate. Should I change other parameters ? Thanks