clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

Question about reading wav #85

Closed whitegon closed 3 years ago

whitegon commented 3 years ago

I am adding VAD in the preprocessing about wav file, and I find there is a difference between wavfile.read and torchaudio.load. Why you use scipy.io not torchaudio?

whitegon commented 3 years ago

I find there is no difference in the training result. But, the VAD which is implemented by torchaudio is a little bit slow. Each wav file needs about 500~800ms to compute VAD. Does anyone has used the other VAD implementation?

Naminwang commented 3 years ago

Face the same problem,have you found the replacement vad method?

whitegon commented 3 years ago

Face the same problem,have you found the replacement vad method?

I use the webrtcvad to preprocess the file before I start to train.

JJun-Guo commented 1 year ago

I find there is no difference in the training result. But, the VAD which is implemented by torchaudio is a little bit slow. Each wav file needs about 500~800ms to compute VAD. Does anyone has used the other VAD implementation?

Hi, how about the performance after using the VAD?