Acoustic Model not converging.

I am using LJSpeech dataset. It has 22500 Hz wav so I downsampled them to 16000 Hz. While the duration model converges, the acoustic model stops after some steps (16) while using the DNN and BLSTM recipes. I am using the default question file and all default setting.

I am also giving a part of the training log. 2019-01-17 16:13:11,771 INFO main.train_DNN: epoch 1, validation error 170.618973, train error 171.866440 time spent 1443.60 2019-01-17 16:38:16,083 INFO main.train_DNN: epoch 2, validation error 170.190796, train error 168.640305 time spent 1504.20 2019-01-17 17:02:13,551 INFO main.train_DNN: epoch 3, validation error 170.218948, train error 167.475052 time spent 1437.34 2019-01-17 17:26:36,610 INFO main.train_DNN: epoch 4, validation error 172.745056, train error 167.336365 time spent 1463.05 2019-01-17 17:51:16,778 INFO main.train_DNN: epoch 5, validation error 171.508743, train error 166.040604 time spent 1480.16 2019-01-17 18:16:09,871 INFO main.train_DNN: epoch 6, validation error 169.632706, train error 165.925156 time spent 1493.09 2019-01-17 18:40:07,466 INFO main.train_DNN: epoch 7, validation error 169.659439, train error 165.281158 time spent 1437.02 2019-01-17 19:04:19,697 INFO main.train_DNN: epoch 8, validation error 170.915634, train error 164.519470 time spent 1452.23 2019-01-17 19:28:09,071 INFO main.train_DNN: epoch 9, validation error 170.550934, train error 164.192902 time spent 1429.37 2019-01-17 19:52:07,551 INFO main.train_DNN: epoch 10, validation error 169.482544, train error 164.866928 time spent 1438.47 2019-01-17 20:16:00,785 INFO main.train_DNN: epoch 11, validation error 169.446396, train error 164.532928 time spent 1433.11 2019-01-17 20:39:48,590 INFO main.train_DNN: epoch 12, validation error 169.466202, train error 164.385376 time spent 1427.68 2019-01-17 21:03:46,210 INFO main.train_DNN: epoch 13, validation error 170.151535, train error 163.765106 time spent 1437.61 2019-01-17 21:27:54,507 INFO main.train_DNN: epoch 14, validation error 169.887222, train error 163.892776 time spent 1448.29 2019-01-17 21:51:59,369 INFO main.train_DNN: epoch 15, validation error 169.361557, train error 164.489304 time spent 1444.86 2019-01-17 22:15:51,948 INFO main.train_DNN: epoch 16, validation error 169.309998, train error 164.324036 time spent 1432.45 2019-01-17 22:15:52,069 INFO main.train_DNN: overall training time: 386.73m validation error 169.309998

This leaves the voice a little muffled. Please suggest something I can do to make the error lesser.

Were you able to train on the entire LJSpeech dataset?

I was able to finish ./04_prepare_conf_files.sh, but got below error when running ./05_train_duration_model.sh:

... 2019-03-04 18:32:47,333 INFO main : D egs/slt_arctic/s2/run_demo.py 2019-03-04 18:32:47,333 INFO main : D egs/slt_arctic/s2/scripts/__init__.py 2019-03-04 18:32:47,333 INFO main : D egs/slt_arctic/s2/scripts/gpu_lock.py 2019-03-04 18:32:47,333 INFO main : D egs/slt_arctic/s2/scripts/label_st_align_to_var_rate.py 2019-03-04 18:32:47,333 INFO main : D egs/slt_arctic/s2/scripts/setup_env.sh 2019-03-04 18:32:47,333 INFO main : D egs/slt_arctic/s2/scripts/submit.sh 2019-03-04 18:32:47,333 INFO main : M misc/scripts/alignment/state_align/forced_alignment.py 2019-03-04 18:32:47,333 INFO main : M tools/compile_other_speech_tools.sh 2019-03-04 18:32:47,334 INFO main : (all diffs logged in feed_forward_6_tanh_06_32PM_March_04_2019.log.gitdiff) 2019-03-04 18:32:47,355 INFO main : Execution information: 2019-03-04 18:32:47,372 INFO main : ... 2019-03-04 18:32:47,372 INFO main : ... 2019-03-04 18:32:47,372 INFO main : ... Traceback (most recent call last): File "/home/.../merlin/src/run_merlin.py", line 1320, in <module> main_function(cfg) File "/home/.../merlin/src/run_merlin.py", line 551, in main_function assert cfg.train_file_number+cfg.valid_file_number+cfg.test_file_number == total_file_number, 'check train, valid, test file number' AssertionError: check train, valid, test file number

I downsampled the 13100 files from the dataset to 16kHz, and set number of files in global_settings.cfg based on the suggestion in #203, i.e. Train=11790 Valid=655 Test=655

Then, I check the number of files in the duration_model/data/file_id_list.scp, but there is only 2098 entries there. Is there additional config required to train on a larger dataset/the LJSpeech dataset? Also not sure why there are still slt_arctic related logs?

Thanks.

CSTR-Edinburgh / merlin

Acoustic Model not converging. #424