Is speech/voice activity detection a part of this implementation

iiscleap commented 2 years ago

Hi @dvisockas and @joonson,

In most speech verification implementations such as Kaldi, SAD/VAD is often applied to remove the silence/non-speech regions. Is this done in this implementation? If yes, can you please let me know where is this done? I seem to have messed this part.

I am not sure how many silence/non-speech regions are present in the voxceleb dataset, but the Kaldi VAD removes a fair amount of audio from the voxceleb corpus. Suppose there is no SAD/VAD used here. In that case, it may be interesting to look at how many non-speech regions of ≥ 2 seconds are present in the dataset and the probability of those regions being sampled, and how would it affect the training.

Shreyas Ramoji, [PhD Scholar, LEAP Lab, Indian Institute of Science]

joonson commented 2 years ago

There is no speech activity detection in this repo. For training, it is not necessary since the VoxCeleb contains files already processed by VAD. For inference on new data, you will need to run VAD using an external package.

JJun-Guo commented 1 year ago

VoxCeleb contains files already processed by VAD

Hi man, are u sure that VoxCeleb contains files already processed by VAD? I read the paper of VoxCeleb saying there is no vad process.

clovaai / voxceleb_trainer

Is speech/voice activity detection a part of this implementation #128