clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

Is speech/voice activity detection a part of this implementation #128

Closed iiscleap closed 2 years ago

iiscleap commented 2 years ago

Hi @dvisockas and @joonson,

In most speech verification implementations such as Kaldi, SAD/VAD is often applied to remove the silence/non-speech regions. Is this done in this implementation? If yes, can you please let me know where is this done? I seem to have messed this part.

I am not sure how many silence/non-speech regions are present in the voxceleb dataset, but the Kaldi VAD removes a fair amount of audio from the voxceleb corpus. Suppose there is no SAD/VAD used here. In that case, it may be interesting to look at how many non-speech regions of ≥ 2 seconds are present in the dataset and the probability of those regions being sampled, and how would it affect the training.

Shreyas Ramoji, [PhD Scholar, LEAP Lab, Indian Institute of Science]

joonson commented 2 years ago

There is no speech activity detection in this repo. For training, it is not necessary since the VoxCeleb contains files already processed by VAD. For inference on new data, you will need to run VAD using an external package.

JJun-Guo commented 1 year ago

VoxCeleb contains files already processed by VAD

Hi man, are u sure that VoxCeleb contains files already processed by VAD? I read the paper of VoxCeleb saying there is no vad process.