YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.07k stars 205 forks source link

Use different sample rate #24

Closed hbellafkir closed 2 years ago

hbellafkir commented 2 years ago

Can I use a different sampling rate like 22 kHz for fine tuning?

YuanGongND commented 2 years ago

Yes - I think you don't need to change anything of the code for a new sampling rate if you don't use our AudioSet pretrained model (i.e., not setting audioset_pretrain=True). You can (and should) still use ImageNet pretraining (i.e., setting imagenet_pretrain=True). Please let me know if you get an error for new sampling rates.

But our AudioSet pretrained model is trained with 16kHz audios, so it would be better if you resample your audios to 16kHz. The improvement of AudioSet pretraining is obvious for audio event classification tasks.

-Yuan

hbellafkir commented 2 years ago

The problem is, that 16 kHz can only used for frequencies until 8kHz, where my data contains higher frequencies (>8 kHz

YuanGongND commented 2 years ago

In that case, you can just use your sample rate and set audioset_pretrain=False when you initialize the AST model.

-Yuan