AST tiny and small pretrained models

YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

BSD 3-Clause "New" or "Revised" License

1.06k stars 203 forks source link

Hi Yuan, I have tested your AST mode pretrained on Audioset on my own dataset and I noticed that it achieves similar performance as EfficientNet pretrained on Audioset using psla pipeline. I was wondering if maybe a smaller AST model could achieve same performance while using less parameters (my dataset is not as big as Audioset). Did you train also tiny and small AST? Would it be possible to have AST tiny and small models pretrained on Audioset? I would like to compare them to the other models I tested (AST base, EfficientNet) and analyse the benefit of Audioset pretraining on other audio classification tasks. I will cite your work in future publications of course. Thank you in advance.

Annalisa

Hi Annalisa,

Unfortunately, I don't have tiny&small ImageNet+AudioSet pretrained models. The SSAST repo has in-domian pretrained model of all sizes but it is based on a different pretraining scheme. Without AudioSet pretraining, you are free to use tiny & small models with only ImageNet pretraining.

I have tested your AST mode pretrained on Audioset on my own dataset and I noticed that it achieves similar performance as EfficientNet pretrained on Audioset using psla pipeline.

They can be similar, but I usually see a small improvement in my experiments. The AST model needs a 10X smaller learning rate than the EfficientNet model.

-Yuan

YuanGongND / ast

AST tiny and small pretrained models #74