YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.06k stars 202 forks source link

computing the normalization stats #35

Closed LqNoob closed 2 years ago

LqNoob commented 2 years ago

Hi, thank you for your great work!

I have a question regarding the differences of the parameter values('freqm' ). When computing the normalization stats -- mean and std, the parameter values are 24. But during model training, it's 48. Why are their values different in these two processes?

YuanGongND commented 2 years ago

Hi there,

We do not need very accurate normalization stats. In the inference stage, we don't use any SpecAug (i.e., freqm=timem=0) but reuse the same norm stats. You can certainly try using 48 or 0, and see which performs better, I assume they are similar.

-Yuan