Closed Jozdien closed 3 years ago
Hi there,
For normalization stats, it is just the mean / std of the spectrogram of all samples in the dataset. You can check this issue. Correct normalization is crucial, if your task is AudioSet, you can just use our norm stats, i.e., (-4.25 mean and 4.57 std).
For SpecAug and Mixup rate, these are hyperparameters that you need to search for your task. You can check our PSLA paper Section IV.B and IV.C for details. You can also set both as 0 for your first model. They won't dramatically change the model performance.
-Yuan
I see, thank you for the explanation!
Hey,
I'm pretty new to working with audio data in classification, so could you give some insight into some of the parameters / stats mentioned in steps 2 - 4 in the "Use Pretrained Model For Downstream Tasks" section? Specifically, a bit more clarification on getting the normalization stats, and how the parameters in steps 2 (SpecAug and mixup rate) and 4 need to be changed for different kinds of input or how they affect the model.