clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

About the changing training data sample rate #97

Closed whitegon closed 3 years ago

whitegon commented 3 years ago

If I want to use 8k NIST SRE data to train my system. Do I need to change the setting about n_fft, win_length, and hop_length in torchaudio.transforms.MelSpectrogram(sample_rate=8000, n_fft=512, win_length=400, hop_length=160, window_fn=torch.hamming_window, n_mels=n_mels) ?

joonson commented 3 years ago

It would be best to up-sample the input to 16kHz using ffmpeg prior to training.

ukemamaster commented 3 years ago

@joonson Why is it recommended to up-sample to 16kHz? Why can't we down-sample to 8kHz? Will it degrade the performance?