Closed skirdey closed 2 years ago
The model is pretrained with only 16kHz data (both AudioSet and Librispeech we use to train the model are re-sampled to 16kHz), so my guess is in the fine-tuning stage, the sampling rate should be consistent. Otherwise you can pretrain the model using a different sampling rate, the pretraining is not that expensive (a few days on 4X1080 GPUs).
-Yuan
When using pre-trained models for fine-tuning, shall the fine-tuning training set have a specific sample rate, like 16khz?