CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 442 forks source link

Merlin AHOCODER #339

Open akfalc opened 6 years ago

akfalc commented 6 years ago

Here I met again another error, I don't know either it is because of config file or python script.

CRITICAL main : train_DNN threw an exception Traceback (most recent call last): File "/home/akfalc/merlin/src/run_merlin.py", line 1378, in main_function(cfg) File "/home/akfalc/merlin/src/run_merlin.py", line 865, in main_function cmp_mean_vector = cmp_mean_vector, cmp_std_vector = cmp_std_vector,init_dnn_model_file=cfg.start_from_trained_model) File "/home/akfalc/merlin/src/run_merlin.py", line 222, in train_DNN shared_train_set_xy, temp_train_set_x, temp_train_set_y = train_data_reader.load_one_partition() File "/home/akfalc/merlin/src/utils/providers.py", line 296, in load_one_partition shared_set_xy, temp_set_x, temp_set_y = self.load_next_partition() File "/home/akfalc/merlin/src/utils/providers.py", line 751, in load_next_partition in_features, lab_frame_number = io_fun.load_binary_file_frame(self.x_files_list[self.file_index], self.n_ins) File "/home/akfalc/merlin/src/io_funcs/binary_io.py", line 64, in load_binary_file_frame fid_lab = open(file_name, 'rb') IOError: [Errno 2] No such file or directory: '/home/akfalc/merlin/egs/voice_conversion/s1/experiments/jmk2ksp/acoustic_model/inter_module/nn_no_silence_lab_norm_127/arctic_a0043.cmp' Lock freed

Thanks everyone who contribute.

felipeespic commented 6 years ago

Hi,

You can start debugging by running Merlin until the NORMCMP step and check if the .cmp files are correctly generated and placed in '/home/akfalc/merlin/egs/voice_conversion/s1/experiments/jmk2ksp/acoustic_model/inter_module/nn_no_silence_lab_norm_127/'

akfalc commented 6 years ago

Hi,

Actually, it is creating cmp files and placed them in /home/akfalc/merlin/egs/voice_conversion/s1/experiments/jmk2ksp/acoustic_model/inter_module/nn_cc_fv_fo_127

So I don't understand why this files are asked from that directory.

felipeespic commented 6 years ago

OK, I see the problem.

Check the script ./scripts/create_symbolic_link.sh (6th step in the vc recipe). It should create the nn_no_silence_lab_norm_127/ folder as a symbolic link pointing to nn_cc_fv_fo_127/. In that way, the files will be automatically redirected.

akfalc commented 6 years ago

Hi, Thank you for the help.

It seems like above problem has been solved but there is another error now about dimension of the matrices. Here is the created error: 2018-04-21 09:22:27,154 INFO main : label dimension is 127 2018-04-21 09:22:27,155 INFO main : training DNN 2018-04-21 09:22:27,931 INFO main.train_DNN: building the model 2018-04-21 09:23:24,013 INFO main.train_DNN: fine-tuning the DNN model 2018-04-21 09:24:21,418 INFO main.train_DNN: epoch 1, validation error nan, train error nan time spent 57.40 2018-04-21 09:24:21,418 INFO main.train_DNN: overall training time: 0.96m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000 2018-04-21 09:24:21,426 INFO main : generating from DNN 2018-04-21 09:24:21,733 INFO dnn_generation: generating 1 of 132: /home/akfalc/merlin/egs/voice_conversion/s1/experiments/jmk2ksp/acoustic_model/inter_module/nn_no_silence_lab_norm_127/arctic_b0408.cmp Traceback (most recent call last): File "/home/akfalc/merlin/src/run_merlin.py", line 1377, in main_function(cfg) File "/home/akfalc/merlin/src/run_merlin.py", line 939, in main_function dnn_generation(test_x_file_list, nnets_file_name, lab_dim, cfg.cmp_dim, gen_file_list, reshape_io) File "/home/akfalc/merlin/src/run_merlin.py", line 436, in dnn_generation predicted_parameter = dnn_model.parameter_prediction(test_set_x) File "/home/akfalc/merlin/src/models/deep_rnn.py", line 279, in parameter_prediction predict_parameter = test_out() File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 898, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in call self.fn() if output_subset is None else\ ValueError: dimension mismatch in args to gemm (392,127)x(187,1024)->(392,1024) Apply node that caused the error: GpuDot22(<CudaNdarrayType(float32, matrix)>, GpuFromHost.0) Toposort index: 9 Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)] Inputs shapes: [(392, 127), (187, 1024)] Inputs strides: [(127, 1), (1024, 1)] Inputs values: ['not shown', 'not shown'] Outputs clients: [[GpuElemwise{add,no_inplace}(GpuDot22.0, GpuDimShuffle{x,0}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

I have changed all the related dimension info based on my data's dimension but still it gives an error.

mdekorte commented 6 years ago

From your log, I figure that the model was never trained because the learning rate was too high, thus can not generate features. Try lowering your learning rate in your configurations (say half the value, or even smaller to be safe) and see if that works. For future reference, if you ever get a validation error that is this big again, it is probably caused by a learning rate that is too high. Usually the error will be in between 100 and 1000 in most cases.

felipeespic commented 6 years ago

Another suggestion: Check that there is no any NaN or inf in your features.

akfalc commented 6 years ago

Hi, another error appears:

2018-04-24 13:34:37,556 INFO main : label dimension is 127 2018-04-24 13:34:37,556 INFO main : training DNN 2018-04-24 13:34:37,709 INFO main.train_DNN: building the model 2018-04-24 13:34:48,043 INFO main.train_DNN: fine-tuning the DNN model 2018-04-24 13:47:43,274 INFO main.train_DNN: epoch 1, validation error nan, train error nan time spent 775.23 2018-04-24 13:47:43,274 INFO main.train_DNN: overall training time: 12.92m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000 2018-04-24 13:47:43,279 INFO main : generating from DNN 2018-04-24 13:47:45,697 INFO dnn_generation: generating 1 of 132: /home/akfalc/merlin/egs/voice_conversion/s1/experiments/jmk2ksp/acoustic_model/inter_module/nn_no_silence_lab_norm_127/arctic_b0408.cmp Traceback (most recent call last): File "/home/akfalc/merlin/src/run_merlin.py", line 1378, in main_function(cfg) File "/home/akfalc/merlin/src/run_merlin.py", line 939, in main_function dnn_generation(test_x_file_list, nnets_file_name, lab_dim, cfg.cmp_dim, gen_file_list, reshape_io) File "/home/akfalc/merlin/src/run_merlin.py", line 436, in dnn_generation predicted_parameter = dnn_model.parameter_prediction(test_set_x) File "/home/akfalc/merlin/src/models/deep_rnn.py", line 277, in parameter_prediction givens={self.x: test_set_x, self.is_train: np.cast'int32'}, on_unused_input='ignore') File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 326, in function output_keys=output_keys) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 449, in pfunc no_default_updates=no_default_updates) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 219, in rebuild_collect_shared cloned_v = clone_v_get_shared_updates(v, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 93, in clone_v_get_shared_updates clone_v_get_shared_updates(i, copy_inputs_over) File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 96, in clone_v_get_shared_updates [clone_d[i] for i in owner.inputs], strict=rebuild_strict) File "/usr/local/lib/python2.7/dist-packages/theano/gof/graph.py", line 238, in clone_with_new_inputs new_inputs[i] = curr.type.filter_variable(new) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/type.py", line 235, in filter_variable self=self)) TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable <TensorType(float32, 3D)>) into Type TensorType(float32, matrix). You can try to manually convert <TensorType(float32, 3D)> into a TensorType(float32, matrix). I dont know what is the problem.

akfalc commented 6 years ago

Above problem arise from having earlier set model now error has been changed as I used another dataset. here is the error :

2018-04-24 15:09:25,732 INFO main.train_DNN: epoch 1, validation error nan, train error nan time spent 66.42 2018-04-24 15:09:25,732 INFO main.train_DNN: overall training time: 1.11m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000 2018-04-24 15:09:25,739 INFO main : generating from DNN Traceback (most recent call last): File "/home/akfalc/merlin/src/run_merlin.py", line 1378, in main_function(cfg) File "/home/akfalc/merlin/src/run_merlin.py", line 939, in main_function dnn_generation(test_x_file_list, nnets_file_name, lab_dim, cfg.cmp_dim, gen_file_list, reshape_io) File "/home/akfalc/merlin/src/run_merlin.py", line 419, in dnn_generation dnn_model = pickle.load(open(nnets_file_name, 'rb')) IOError: [Errno 2] No such file or directory: '/home/akfalc/merlin/egs/voice_conversion/s1/experiments/rms_1502clb_150/acoustic_model/nnets_model/feed_forward_6_tanh.model'

at some place it has to create model but because of some unknown problem it fails to create it. what can be the reason for this ? Thanks in advance.