dr-pato / audio_visual_speech_enhancement

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
https://dr-pato.github.io/audio_visual_speech_enhancement/
Apache License 2.0
106 stars 25 forks source link

Value error when trying to cut audio #10

Closed nanometer34688 closed 4 years ago

nanometer34688 commented 4 years ago

I have found an issue while using the GRID dataset. The audio/video files range in length. Using the audio_preprocessing script, the option 'max_wav_length' seems to fail when wanting to set the desired length of the audio.

So for example if i run the option with 42000 as the max wave length, I get this error:

audio_samples[i, n_fft//2: len(samples) + n_fft//2] = samples ValueError: could not broadcast input array from shape (42240) into shape (42000)

I'm assuming that it should cut the audio up to 42000 samples? Am i correct in thinking this?

dr-pato commented 4 years ago

Hi, actually not. If you read the help of 'audio_processing' subcommand you can find the description of '--max_audio_length' option:

Set this value to the maximum length (in samples with desidered sample rate) of single wav

You can easily modify the code to take shorter segments for each sample.

Giovanni

nanometer34688 commented 4 years ago

That's the option I am having trouble with. If an audio file has a length longer than 42000, i get that error.

This is the command I run:

python3 av_speech_enhancement.py audio_preprocessing --data_dir zipped_data/TEST_SET --speaker_ids 2 8 --dest_dir . --audio_dir . -ml 42000

And this is the error i get:

audio_samples[i, n_fft//2: len(samples) + n_fft//2] = samples ValueError: could not broadcast input array from shape (42240) into shape (42000)

Is this something you have seen before?

nanometer34688 commented 4 years ago

On the GRID Copus dataset, did you use the default value of 48000 for your max_audio_length option?

nanometer34688 commented 4 years ago

@dr-pato I seem to have found the issue.

audio_features.py line 39 is:

audio_samples[i, n_fft//2: len(samples) + n_fft//2] = samples

But it causes an error as samples can be larger than audio samples.

The fix i made was as follows:

audio_samples[i, n_fft//2: len(samples) + n_fft//2] = samples[:max_audio_length]

Now my audio files have been cut and have now fixed this issue.

Thank you