dr-pato / audio_visual_speech_enhancement

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
https://dr-pato.github.io/audio_visual_speech_enhancement/
Apache License 2.0
106 stars 25 forks source link

Training Model #12

Closed pradnyapagar05 closed 4 years ago

pradnyapagar05 commented 4 years ago

Hello sir, I am facing some problem while running the program. currently I am working with 10 speaker -6 for training ,2 for validation and 2 for testing. And directory structure :Data -s1
-audio(containing .wav and .npy file ) -video(containing .mpg and .txt file) -TBM -s2

                                       --  
                                       -s10   
                                      -mix
                                           -Training_set(containg .wav and .npy file  )(training set contain  6 num of samples of 6 speaker total 36 .wav file)
                                            -Test_Set(4 .wav)
                                           -Validation_Set(4 .wav)
                                      -tfrecords
                                            -Training_set(212 files)
                                             -Test_set(8 files)
                                             -Validation_Set(8 files)

I am Working with training.py file and using av_concat_mask_ref model ,while running this file i am facing some errors. i am passing arguments by considering this function calling : 1.config = Configuration(args.learning_rate, args.updating_step, args.learning_decay, args.dropout, args.batch_size,args.opt, args.video_dim, args.audio_dim, args.num_audio_samples, args.epochs, args.hidden_units,args.layers, args.regularization, args.mask_threshold) 2.train(args.model, args.data_dir, args.train_set, args.val_set, config, args.exp, args.mode) Actual Paramenters i am passing are: config = Configuration(10^-3, 1000, 1.0, 1,366,'adam', 136,257, 366, 5 , 250,3, 0, -1) train('av_concat_mask_ref', '/content/drive/My Drive/project1', 'tfrecords/TRAINING_SET','tfrecords/VALIDATION_SET', config, '0', 'fixed')

getting error as: InvalidArgumentError: {{function_node __inference_Dataset_map_DataManager.read_data_format_fixed_32951}} Name: , Key: base_audio_wav, Index: 0. Number of float values != expected. values size: 31040 but output shape: [216] [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[validation_batch/IteratorGetNext]]

During handling of the above exception, another exception occurred:

InvalidArgumentError Traceback (most recent call last)

/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py in _do_call(self, fn, *args) 1382 '\nsession_config.graph_options.rewrite_options.' 1383 'disable_meta_optimizer = True') -> 1384 raise type(e)(node_def, op, message) 1385 1386 def _extend_graph(self):

InvalidArgumentError: Name: , Key: base_audio_wav, Index: 0. Number of float values != expected. values size: 31040 but output shape: [216] [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[validation_batch/IteratorGetNext]]

Can you please help me sir for solving this issue?please help me sir . Thank you.

nanometer34688 commented 4 years ago

I am also getting this issue

dr-pato commented 4 years ago

Hi @pradnyapagar05 ,

config = Configuration(10^-3, 1000, 1.0, 1,366,'adam', 136,257, 366, 5 , 250,3, 0, -1)

I don't understand why you used 366 as batch size and 366 as num_audio_samples. Maybe you were wrong to write..

InvalidArgumentError:` {{function_node __inference_Dataset_map_DataManager.read_data_format_fixed_32951}} Name: , Key: base_audio_wav, Index: 0. Number of float values != expected. values size: 31040 but output shape: [216] [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[validation_batch/IteratorGetNext]]

It is a bit strange because 216 should be equal to num_audio_samples (see above). 31040 is the real length of the wav in your tfrecord that must to equal to num_audio_samples when you use 'fixed' mode. I guess something went wrong during mixed-audio or tfrecord generation..

pradnyapagar05 commented 4 years ago

Hello sir, Thank you for replying to my issue. I guess i did something wrong ....... For fist mix speech generation I have passed following parameters.. create_mixed_tracks_data('/content/drive/My Drive/project/data1', [1,2,3,4,5,6],[1,2,3,4,5,6],'audio','mix/trainingt', 6, 6, 1)

save_spectrograms('/content/drive/My Drive/project/data1', [1,2,3,4,5,6], 'audio', ' audio',16e3, 48000)

config = Configuration(0.001, 50, 0.9, 1,16,'adam', 136,257, 50653, 20 , 250,3,1.0000e-04, -1) train('vl2m', '/content/drive/My Drive/project_data_final', 'tfrecords/TRAINING_SET','tfrecords/VALIDATION_SET', config, '0', 'fixed')

I am confused while passing parameters ..

create_mixed_tracks_data(args.data_dir, args.base_speaker_ids, args.noisy_speaker_ids, args.audio_dir,args.dest_dir, args.num_samples, args.num_mix, args.num_mix_speakers)

save_spectrograms(args.data_dir, args.speaker_ids, args.audio_dir, args.dest_dir,args.sample_rate, args.max_wav_length)

config = Configuration(args.learning_rate, args.updating_step, args.learning_decay, args.dropout, args.batch_size, args.opt, args.video_dim, args.audio_dim, args.num_audio_samples, args.epochs, args.hidden_units,args.layers, args.regularization, args.mask_threshold)

train(args.model, args.data_dir, args.train_set, args.val_set, config, args.exp, args.mode)

Please help me solve this issue. Thank You sir.

dr-pato commented 4 years ago

config = Configuration(0.001, 50, 0.9, 1,16,'adam', 136,257, 50653, 20 , 250,3,1.0000e-04, -1)

Why are you using 50653 as number of audio samples? If you use GRID dataset num_audio_samples has to be 48000 (3 seconds and 16000 kHz). You can check the meaning of command parameters using --help option..