dr-pato / audio_visual_speech_enhancement

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
https://dr-pato.github.io/audio_visual_speech_enhancement/
Apache License 2.0
106 stars 25 forks source link

Error from training script #22

Closed jungin-jin-choi closed 4 years ago

jungin-jin-choi commented 4 years ago

Hi, thanks for your nice & clean repo, dr. pato. I had problems while running the training script, which says:

Traceback (most recent call last):
  File "av_speech_enhancement.py", line 227, in <module>
    main()
  File "av_speech_enhancement.py", line 217, in main
    train(args.model, args.data_dir, args.train_set, args.val_set, config, args.exp, args.mode)
  File "/tf/audio_visual_speech_enhancement/training.py", line 130, in train
    val_mixed_audio, val_base_paths, val_other_paths, val_mixed_paths = sess.run(next_val_batch)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Feature list 'base_audio_path' is required but could not be found.  Did you mean to include it in feature_list$
dense_missing_assumed_empty or feature_list_dense_defaults?
         [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
         [[validation_batch/IteratorGetNext]]

While searching for this error, I found some discussion which implied that the problem was with tfrecord However when I printed the tfrecord inspection code, I found that there were base_audio_path in the tfrecord keys (printed keys: dict_keys(['base_audio_path', 'mix_audio_path', 'mix_audio_wav', 'base_audio_wav', 'sequence_length', 'other_audio_wav', 'other_audio_path'])) Do you have any idea?

FYI, Following is my training script

python av_speech_enhancement.py training --data_dir /tf/data/GRID --train_set tfrecords/TRAINING_SET --val_set tfrecords/VALIDATION_SET --exp 1 --mode var --num_audio_samples 48000 --model vl2m --opt adam --learning_rate 0.005 --batch_size 4 --epochs 10 -nl 1 -nh 1

Following is my dataset directory structure

|-- MIXED
|   |-- TEST_SET
|   |-- TRAINING_SET
|   `-- VALIDATION_SET
|-- check_tfrecords.py
|-- logs
|   |-- checkpoints
|   |-- tensorboard
|   `-- training_logs
|-- s1
|   |-- audio
|   |-- face_landmark
|   |-- tbm
|   `-- video
|-- s2
|   |-- audio
|   |-- face_landmark
|   |-- tbm
|   `-- video
|-- s3
|   |-- audio
|   |-- face_landmark
|   |-- tbm
|   `-- video
|-- shape_predictor_68_face_landmarks.dat
`-- tfrecords
    |-- TEST_SET
    |-- TRAINING_SET
    |-- VALIDATION_SET
    `-- logs
jungin-jin-choi commented 4 years ago

Well I not exactly resolved this problem but.. Somehow I figured out that the wav files had variable length, which was my mistake to download endpoint audio of GRID dataset. After redoing preprocessing & tfrecords generation with the fixed length audio dataset, training was successful!