CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 442 forks source link

Results with Merlin and AhoCoder, (System is not getting trained well on Accoustic Model) #160

Open ajinkyakulkarni14 opened 7 years ago

ajinkyakulkarni14 commented 7 years ago

I used AhoCoder Vocoder instead of WORLD for generating wav. To adapt ahocoder with merlin, I set the BAP to 1, MGC to 40 and LF0 to 1. But acoustic model is converging to under-trained parameters.

So, For epoch 1, Validation error was 124.755600 and at the end of training Validation error was 124.873734. I am training the system with 7500 samples (train:test:validation 70:20:10). I have trained the system with same data and configuration with WORLD vocoder and obtained good results.

Any help regards to tuning the parameter will be helpful !!

I am sharing log file contents,

2017-06-08 15:35:11,770 INFO main: training DNN 2017-06-08 15:35:11,770 DEBUG main.train_DNN: Starting train_DNN 2017-06-08 15:35:11,770 DEBUG main.train_DNN: Creating training data provider 2017-06-08 15:35:14,111 DEBUG main.train_DNN: Creating validation data provider 2017-06-08 15:35:44,810 INFO main.train_DNN: building the model 2017-06-08 15:35:47,192 INFO main.train_DNN: fine-tuning the DNN model 2017-06-08 16:34:41,895 DEBUG main.train_DNN: calculating validation loss 2017-06-08 16:36:19,725 INFO main.train_DNN: epoch 1, validation error 124.755600, train error 118.544495 time spent 3632.52 2017-06-08 17:35:52,687 DEBUG main.train_DNN: calculating validation loss 2017-06-08 17:37:31,993 INFO main.train_DNN: epoch 2, validation error 124.233429, train error 117.376945 time spent 3672.26 2017-06-08 18:37:23,636 DEBUG main.train_DNN: calculating validation loss 2017-06-08 18:39:04,533 INFO main.train_DNN: epoch 3, validation error 124.167480, train error 116.833282 time spent 3692.53 2017-06-08 19:39:00,001 DEBUG main.train_DNN: calculating validation loss 2017-06-08 19:40:41,272 INFO main.train_DNN: epoch 4, validation error 124.307854, train error 116.384827 time spent 3696.73 2017-06-08 19:40:47,345 DEBUG main.train_DNN: validation loss increased 2017-06-08 20:41:12,616 DEBUG main.train_DNN: calculating validation loss 2017-06-08 20:42:56,075 INFO main.train_DNN: epoch 5, validation error 124.414200, train error 115.943253 time spent 3728.72 2017-06-08 20:42:56,075 DEBUG main.train_DNN: validation loss increased 2017-06-08 21:43:36,342 DEBUG main.train_DNN: calculating validation loss 2017-06-08 21:45:19,690 INFO main.train_DNN: epoch 6, validation error 124.683731, train error 115.496056 time spent 3743.60 2017-06-08 21:45:19,715 DEBUG main.train_DNN: validation loss increased 2017-06-08 22:45:57,052 DEBUG main.train_DNN: calculating validation loss 2017-06-08 22:47:41,887 INFO main.train_DNN: epoch 7, validation error 124.897171, train error 115.040901 time spent 3742.17 2017-06-08 22:47:41,887 DEBUG main.train_DNN: validation loss increased 2017-06-08 23:48:31,414 DEBUG main.train_DNN: calculating validation loss 2017-06-08 23:50:17,122 INFO main.train_DNN: epoch 8, validation error 125.163406, train error 114.582947 time spent 3755.23 2017-06-08 23:50:17,122 DEBUG main.train_DNN: validation loss increased 2017-06-09 00:51:20,888 DEBUG main.train_DNN: calculating validation loss 2017-06-09 00:53:06,306 INFO main.train_DNN: epoch 9, validation error 125.482910, train error 114.136086 time spent 3769.18 2017-06-09 00:53:06,307 DEBUG main.train_DNN: validation loss increased 2017-06-09 01:54:14,362 DEBUG main.train_DNN: calculating validation loss 2017-06-09 01:56:00,493 INFO main.train_DNN: epoch 10, validation error 125.751518, train error 113.707466 time spent 3774.18 2017-06-09 01:56:00,494 DEBUG main.train_DNN: validation loss increased 2017-06-09 02:57:10,822 DEBUG main.train_DNN: calculating validation loss 2017-06-09 02:58:56,298 INFO main.train_DNN: epoch 11, validation error 126.665062, train error 115.106544 time spent 3775.80 2017-06-09 02:58:56,299 DEBUG main.train_DNN: validation loss increased 2017-06-09 04:00:06,978 DEBUG main.train_DNN: calculating validation loss 2017-06-09 04:01:52,987 INFO main.train_DNN: epoch 12, validation error 126.231293, train error 114.063744 time spent 3776.68 2017-06-09 05:03:08,415 DEBUG main.train_DNN: calculating validation loss 2017-06-09 05:04:54,512 INFO main.train_DNN: epoch 13, validation error 126.109924, train error 113.193024 time spent 3781.52 2017-06-09 06:06:12,841 DEBUG main.train_DNN: calculating validation loss 2017-06-09 06:07:58,990 INFO main.train_DNN: epoch 14, validation error 125.804138, train error 112.745514 time spent 3784.47 2017-06-09 07:09:14,384 DEBUG main.train_DNN: calculating validation loss 2017-06-09 07:11:00,796 INFO main.train_DNN: epoch 15, validation error 125.375961, train error 112.674545 time spent 3781.80 2017-06-09 08:12:32,620 DEBUG main.train_DNN: calculating validation loss 2017-06-09 08:14:19,672 INFO main.train_DNN: epoch 16, validation error 124.873734, train error 112.769409 time spent 3798.86 2017-06-09 08:14:19,709 DEBUG main.train_DNN: stopping early 2017-06-09 08:14:19,709 INFO main.train_DNN: overall training time: 998.54m validation error 124.167480 2017-06-08 12:31:00,852 INFO main: calculating MCD 2017-06-08 12:31:03,568 INFO main: Develop: DNN -- MCD: 7.401 dB; BAP: 120.750 dB; F0:- RMSE: 14.645 Hz; CORR: 0.461; VUV: 15.763% 2017-06-08 12:31:03,568 INFO main: Test : DNN -- MCD: 7.222 dB; BAP: 118.436 dB; F0:- RMSE: 16.234 Hz; CORR: 0.470; VUV: 14.266%

bajibabu commented 7 years ago

I think with a few modifications you can extend the Merlin to aho-vocoder, for reference, please check how glotthmm vocoder was integrated with its extracted parameters. What are the BAP feats in the case of aho-vocoder? and did you check the copy-synthesis quality with extracted parameters? Some vocoders are highly sensitive to their analysis parameters, if you modify/compress them then you loose the quality significantly.

dreamk73 commented 7 years ago

I was able to train a voice with Ahocoder recently. Mine got to 24 epochs but the results look similar to yours with a BAP around 100dB. I didn't investigate it further, but I hope you can figure it out because it would be nice to be able to try different vocoders.

ajinkyakulkarni14 commented 7 years ago

@bajibabu BAP dimension in case of AhoCoder is 1 . There are no issues regards to integration of AhoCoder with Merlin. The main problem is parameters are not getting trained well.
yes, I checked the copy-synthesis quality with extracted parameters and it was fine on that part.

@dreamk73 Do you know any other freely available vocoder which performs better than WORLD ? Do you have any idea about what might be the issue here ?

Thanks for all your responses.

ronanki commented 7 years ago
  1. You can try min-max normalization for acoustic features instead of MVN.
  2. The other thing is to fine-tune the learning rate -- default learning rate is too high for your data, try decreasing it.
  3. Try keras version to train the acoustic models.
bajibabu commented 7 years ago

I just downloaded the ahocoder and run through copy-synthesis. What I find that values in bap represents the maximum voiced frequency and those values are similar to F0 values i.e. they have many zero values. The BAP values range is very high, perhaps that's the reason for high error rates. Can you interpolate the bap values as f0 values and use the 'vuv' value to replace the zero values at synthesis time.

dreamk73 commented 7 years ago

@ajinkyakulkarni14 I have tried GlottHMM as well but all three only for one of our (female) TTS voices. I don't think any of them is fantastic or better than WORLD. I think our main problem is improving the parameter extraction and finding the optimal values for a particular voice. I am planning to spend more time this summer investigating this problem.

bajibabu commented 7 years ago

@dreamk73 good to know. We are also doing a large scale vocoders comparison (with four SOTA vocoders and four voices). We will let you know what are our conclusions in soon.

ajinkyakulkarni14 commented 7 years ago

@bajibabu , Thank you for your important feedback. In case of AhoCoder, BAP is optional value, and Speech can be synthesised with LF0 and MGC value.

Thus, I will try to train the Merlin with Ahocoder parameters LF0 and MGC only (excluding BAP value). I will repost the new results with this experiment

@ronanki Thank you for advice, I will fine tune the network and repost the results with that experiments.

@dreamk73 I tried AhoCoder with HTS and it works better than HTS + WORLD at least for my speech corpus.

dreamk73 commented 7 years ago

@bajibabu, I'll be very interested in your results.

ajinkyakulkarni14 commented 7 years ago

I tried to tune the network with RNNs and got improved results. Though results are still not that good compared to WORLD vocoder. I am attaching the results. In future I will try to tune the network.

2017-06-09 10:45:49,116 INFO configuration: hidden_layer_size : [256, 256, 516, 256, 256] 2017-06-09 10:45:49,116 INFO configuration: projection_learning_rate_scaling : 1.0 2017-06-09 10:45:49,116 INFO configuration: pretraining_epochs : 10 2017-06-09 10:45:49,116 INFO configuration: initial_projection_distrib : gaussian 2017-06-09 10:45:49,116 INFO configuration: index_to_project : 0 2017-06-09 10:45:49,116 INFO configuration: l2_reg : 1e-05 2017-06-09 10:45:49,116 INFO configuration: warmup_momentum : 0.5 2017-06-09 10:45:49,116 INFO configuration: training_epochs : 200 2017-06-09 10:45:49,116 INFO configuration: hidden_activation : tanh 2017-06-09 10:45:49,116 INFO configuration: hidden_layer_type : ['LSTM', 'LSTM', 'TANH', 'LSTM', 'LSTM'] 2017-06-09 10:45:49,116 INFO configuration: sequential_training : True 2017-06-09 10:45:49,116 INFO configuration: do_pretraining : False 2017-06-09 10:45:49,116 INFO configuration: projection_insize : 10000 2017-06-09 10:45:49,116 INFO configuration: l1_reg : 0.0 2017-06-09 10:45:49,116 INFO configuration: output_activation : linear 2017-06-09 10:45:49,116 INFO configuration: pretraining_lr : 0.0001 2017-06-09 10:45:49,116 INFO configuration: momentum : 0.9

2017-06-10 12:19:43,551 INFO main: calculating MCD 2017-06-10 12:21:12,915 INFO main: Develop: DNN -- MCD: 6.009 dB; BAP: 91.403 dB; F0:- RMSE: 12.993 Hz; CORR: 0.607; VUV: 8.593% 2017-06-10 12:21:12,915 INFO main: Test : DNN -- MCD: 6.203 dB; BAP: 91.750 dB; F0:- RMSE: 12.385 Hz; CORR: 0.599; VUV: 9.636%

Merlin_exp.zip

gaganbahga commented 6 years ago

@bajibabu I wonder if you have published any results for the vocoders comparison study that you mentioned.

bajibabu commented 6 years ago

The journal article is still in submission stage.. Once it's submitted we will share the results.