CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

Suspiciously Large Acoustic Training Errors for Sinhala #209

Open pasindud opened 7 years ago

pasindud commented 7 years ago

I am trying to train Sinhala Merlin voice with the data that we have open sourced [0]. It consists of 2064 prompts that were recorded (with multiple speakers with similar acoustics), phonology, transcribed lexicon, G2P grammar and textnorm. When training the acoustic model I am getting suspiciously large validation errors.

Following is the way I setup Merlin [1].

Configuration

label_type : phone_align
subphone_feats        : none

hidden_layer_size  : [1024, 1024, 1024, 1024, 1024, 1024]
hidden_layer_type  : ['TANH', 'TANH', 'TANH', 'TANH', 'TANH', 'TANH']

learning_rate    : 0.002
batch_size       : 256
output_activation: linear
warmup_epoch     : 10
warmup_momentum  : 0.3
training_epochs  : 15

train_file_number: 800
valid_file_number: 200
test_file_number : 200

For the acoustic model I am getting very bad error rates but the training errors in the duration model is normal. Any suggestions where the problem lies ?

0 - https://github.com/googlei18n/language-resources/tree/master/si/ 1 - https://github.com/googlei18n/language-resources/blob/master/si/merlin/README.md 2 - https://github.com/googlei18n/language-resources/blob/master/utils/generate_hts_questions.py 3 - https://github.com/googlei18n/language-resources/blob/master/si/festvox/ipa_phonology.json 4 - https://github.com/googlei18n/language-resources/blob/master/utils/setup_merlin.sh 5 - https://github.com/googlei18n/language-resources/tree/master/si/festvox

Excerpt of the acoustic model training log

2017-08-09 09:17:39,245    DEBUG main.train_DNN: Creating validation data provider
2017-08-09 09:28:29,121    DEBUG main.train_DNN: calculating validation loss
2017-08-09 09:29:53,625     INFO main.train_DNN: epoch 1, validation error 207.433853, train error 182.397903  time spent 730.60
2017-08-09 09:40:26,554    DEBUG main.train_DNN: calculating validation loss
2017-08-09 09:41:51,531     INFO main.train_DNN: epoch 2, validation error 207.285492, train error 181.712753  time spent 712.63
2017-08-09 09:52:26,372    DEBUG main.train_DNN: calculating validation loss
2017-08-09 09:53:51,079     INFO main.train_DNN: epoch 3, validation error 207.228226, train error 181.536484  time spent 712.73
2017-08-09 10:04:28,413    DEBUG main.train_DNN: calculating validation loss
2017-08-09 10:05:52,681     INFO main.train_DNN: epoch 4, validation error 207.178268, train error 181.415222  time spent 714.76
2017-08-09 10:16:25,568    DEBUG main.train_DNN: calculating validation loss
2017-08-09 10:17:49,708     INFO main.train_DNN: epoch 5, validation error 207.116272, train error 181.304825  time spent 710.11
2017-08-09 10:28:25,842    DEBUG main.train_DNN: calculating validation loss
2017-08-09 10:29:51,180     INFO main.train_DNN: epoch 6, validation error 207.051727, train error 181.188034  time spent 714.49
2017-08-09 10:40:27,047    DEBUG main.train_DNN: calculating validation loss
2017-08-09 10:41:51,855     INFO main.train_DNN: epoch 7, validation error 206.997467, train error 181.047287  time spent 713.79
2017-08-09 10:52:32,153    DEBUG main.train_DNN: calculating validation loss
2017-08-09 10:53:56,585     INFO main.train_DNN: epoch 8, validation error 206.980042, train error 180.894028  time spent 717.77
2017-08-09 11:04:35,866    DEBUG main.train_DNN: calculating validation loss
2017-08-09 11:06:00,367     INFO main.train_DNN: epoch 9, validation error 206.986008, train error 180.737808  time spent 716.95
2017-08-09 11:06:00,368    DEBUG main.train_DNN: validation loss increased
2017-08-09 11:16:33,463    DEBUG main.train_DNN: calculating validation loss
2017-08-09 11:17:58,356     INFO main.train_DNN: epoch 10, validation error 207.002869, train error 180.577957  time spent 717.99
2017-08-09 11:17:58,356    DEBUG main.train_DNN: validation loss increased
2017-08-09 11:28:33,151    DEBUG main.train_DNN: calculating validation loss
2017-08-09 11:29:57,336     INFO main.train_DNN: epoch 11, validation error 207.332260, train error 180.748123  time spent 718.98
2017-08-09 11:29:57,336    DEBUG main.train_DNN: validation loss increased
2017-08-09 11:40:31,371    DEBUG main.train_DNN: calculating validation loss
2017-08-09 11:41:55,598     INFO main.train_DNN: epoch 12, validation error 207.157928, train error 180.093155  time spent 718.26
2017-08-09 11:52:34,267    DEBUG main.train_DNN: calculating validation loss
2017-08-09 11:53:59,110     INFO main.train_DNN: epoch 13, validation error 206.974045, train error 179.742050  time spent 723.51
2017-08-09 12:04:42,002    DEBUG main.train_DNN: calculating validation loss
2017-08-09 12:06:06,764     INFO main.train_DNN: epoch 14, validation error 207.092087, train error 179.546387  time spent 720.96
2017-08-09 12:06:06,764    DEBUG main.train_DNN: validation loss increased
2017-08-09 12:16:45,385    DEBUG main.train_DNN: calculating validation loss
2017-08-09 12:18:10,629     INFO main.train_DNN: epoch 15, validation error 207.083282, train error 179.445572  time spent 723.86
2017-08-09 12:18:10,629     INFO main.train_DNN: overall  training time: 180.46m validation error 206.974045

MCD

2017-08-10 05:13:09,249 INFO           main    : calculating MCD
2017-08-10 05:13:10,557 INFO           main    : Develop: DNN -- MCD: 7.424 dB; BAP: 0.290 dB; F0:- RMSE: 59.600 Hz; CORR: 0.348; VUV: 15.920%
2017-08-10 05:13:10,557 INFO           main    : Test   : DNN -- MCD: 7.495 dB; BAP: 0.344 dB; F0:- RMSE: 32.555 Hz; CORR: 0.335; VUV: 14.520%
ronanki commented 7 years ago

The MCD numbers are quite high and there must be some problem with acoustic model training.

  1. Is the configuration shown above from acoustic model? If yes, why is subphone_feats set to None? It should be set to coarse_coding or minimal_phoneme as shown here. Setting to None makes all the input frames within a phone constant and doesn't work with a simple feed-forward model. However, using RNN/LSTM in top one or two layers may work to some extent. If training RNNs, I suggest to update the code in order to enable batch training as the previous version(native theano) runs only on a batch size of 1.
  2. If this is not the case, you can try reducing the learning rate to 0.001 or 0.0005 and increase the number of epochs to 25. Also, you can reduce the number of files from validation and test data by 100 each in order to increase the amount of training data.
  3. Also, since you are testing on a new language, please check the force-aligned labels are good enough if not 100% accurate (sanity check of labels).