YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.13k stars 212 forks source link

Question normalize operation #55

Closed liuyoude closed 2 years ago

liuyoude commented 2 years ago

The operation https://github.com/YuanGongND/ast/blob/102f0477099f83e04f6f2b30a498464b78bbaf46/src/dataloader.py#L191 for normalizing should be divided by the standard deviation, not the variance?

YuanGongND commented 2 years ago

Hi there,

It is *2, not **2. We just normalized the input with smaller std.

If you don't use our AudioSet pretrained model, it is fine to use 0 mean and 1 std (i.e., fbank = (fbank - self.norm_mean) / (self.norm_std)). Otherwise please keep the normalization consistent with us.

-Yuan

liuyoude commented 2 years ago

Hi there,

It is *2, not **2. We just normalized the input with smaller std.

If you don't use our AudioSet pretrained model, it is fine to use 0 mean and 1 std (i.e., fbank = (fbank - self.norm_mean) / (self.norm_std)). Otherwise please keep the normalization consistent with us.

-Yuan Thank you!