Open pasindud opened 7 years ago
The MCD numbers are quite high and there must be some problem with acoustic model training.
subphone_feats
set to None
? It should be set to coarse_coding
or minimal_phoneme
as shown here. Setting to None
makes all the input frames within a phone constant and doesn't work with a simple feed-forward model. However, using RNN/LSTM in top one or two layers may work to some extent. If training RNNs, I suggest to update the code in order to enable batch training as the previous version(native theano) runs only on a batch size of 1.
I am trying to train Sinhala Merlin voice with the data that we have open sourced [0]. It consists of 2064 prompts that were recorded (with multiple speakers with similar acoustics), phonology, transcribed lexicon, G2P grammar and textnorm. When training the acoustic model I am getting suspiciously large validation errors.
Following is the way I setup Merlin [1].
Configuration
For the acoustic model I am getting very bad error rates but the training errors in the duration model is normal. Any suggestions where the problem lies ?
0 - https://github.com/googlei18n/language-resources/tree/master/si/ 1 - https://github.com/googlei18n/language-resources/blob/master/si/merlin/README.md 2 - https://github.com/googlei18n/language-resources/blob/master/utils/generate_hts_questions.py 3 - https://github.com/googlei18n/language-resources/blob/master/si/festvox/ipa_phonology.json 4 - https://github.com/googlei18n/language-resources/blob/master/utils/setup_merlin.sh 5 - https://github.com/googlei18n/language-resources/tree/master/si/festvox
Excerpt of the acoustic model training log
MCD