espnet / espnet

End-to-End Speech Processing Toolkit
https://espnet.github.io/espnet/
Apache License 2.0
8.43k stars 2.18k forks source link

Input file length limit for voice enhancement task #4163

Closed takashin3391 closed 2 years ago

takashin3391 commented 2 years ago

When 120 seconds of wav data was input to the voice enhancement task, the following error occurred in Stage 4. The error did not occur with 1 second of wav data.

2022-03-14T16:05:07 (enh.sh:347:main) Stage 4: Remove short data: dump/raw/org -> dump/raw utils/copy_data_dir.sh: copied data from dump/raw/org/train to dump/raw/train utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory dump/raw/train fix_data_dir.sh: no utterances remained: not proceeding further.

The max_wav_duration in enh.sh was set to 300, but the same error occurred.

Is there a limit to the input length of a wav file? Where can I set the maximum length limit for input wav files?

Emrys365 commented 2 years ago

Could you check the file dump/raw/org/train/utt2num_samples manually to see whether all of the values are not in the range [min_wav_duration * fs, max_wav_duration * fs]?

takashin3391 commented 2 years ago

@Emrys365 Thank you for your response.

The settings of enh.sh are as follows.

fs=16k # Sampling rate. min_wav_duration=0.1 # Minimum duration in second max_wav_duration=300 # Maximum duration in second

dump/raw/org/train/utt2num_samples is in the range [min_wav_duration fs, max_wav_duration fs] with the following values.

utt01 1766240 utt02 1766240 utt03 1766240 utt04 1766240 utt05 1766240 utt06 1766240 utt07 1766240 utt08 1766240 utt09 1766240 utt10 1766240 utt11 1766240 utt12 1766240 utt13 1766240 utt14 1766240 utt15 1766240 utt16 1766240

Does the fs notation need to be numeric? Or is 16k fine?

takashin3391 commented 2 years ago

By changing the value of max_wav_duration in run.sh in addition to max_wav_duration in enh.sh, all stages worked. It is confusing to set up in multiple locations. If possible, please improve the configuration so that it can be set in a single config file.