dr-pato / audio_visual_speech_enhancement

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
https://dr-pato.github.io/audio_visual_speech_enhancement/
Apache License 2.0
106 stars 25 forks source link

Training.py #17

Closed pradnyapagar05 closed 4 years ago

pradnyapagar05 commented 4 years ago

Hello sir, While running training.py file by using this command-

av_speech_enhancement.py training --data_dir dataset1 --train_set tfrecords/TRAINING_SET --val_set tfrecords/VALIDATION_SET --exp '0' --mode var --video_dim 136 --audio_dim 257 --num_audio_samples 75000 --model av_concat_mask_ref --opt adam --learning_rate 0.001 --updating_step 50 --learning_decay 0.9 --batch_size 16 --epochs 20 --hidden_units 250 --layers 3 --dropout 1 --regularization 0000e-04

i am getting error -

Traceback (most recent call last): File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_DataManager.read_data_format_var_143}} Name: , Key: tbm, Index: 0. Number of float values != expected. values size: 513 but output shape: [257] [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[validation_batch/IteratorGetNext]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "av_speech_enhancement.py", line 226, in main() File "av_speech_enhancement.py", line 216, in main train(args.model, args.data_dir, args.train_set, args.val_set, config, args.exp, args.mode) File "/home/pradnya/audio_visual_speech_enhancement-master(1)/audio_visual_speech_enhancement-master/training.py", line 133, in train val_mixed_audio, val_base_paths, val_other_paths, val_mixed_paths = sess.run(next_val_batch) File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/pradnya/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Key: tbm, Index: 0. Number of float values != expected. values size: 513 but output shape: [257] [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[validation_batch/IteratorGetNext]]

sir,am i doing anything wrong ?? thank you sir.

dr-pato commented 4 years ago

Hi @pradnyapagar05, your TBMs in tfrecords have 513 frequency bins for each time-step but you set audio_dim parameter to 257 that has to be equal to dim in your data (513 in your case). You should modify the fft size to control the frequence dim of the spectrogram (with fft-size=512 -> audio_dim=257; fft-size=1024 -> audio_dim=513).