YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.17k stars 221 forks source link

What is the objective when pretraining? #107

Open Young973 opened 1 year ago

Young973 commented 1 year ago

TBH, I'm a little confused about what is the objective when pretraining with AST? It seems it is not indicated in the paper. BTW, when pretraining SSAST discriminative objective is the classification with InfoNCE and generative objective is reconstruction. But what is it in AST?

YuanGongND commented 1 year ago

hi there,

It is just ImageNet pretraining.

I.e., using ImageNet pretrained DeiT as the initial weight for AST.

https://github.com/YuanGongND/ast/blob/31088be8a3f6ef96416145c4b8d43c81f99eba7a/src/models/ast_models.py#L60-L68

-Yuan

YuanGongND commented 1 year ago

Some modification is needed. See https://github.com/YuanGongND/ast/blob/master/src/models/ast_models.py.

YuanGongND commented 1 year ago

If you mean audio domain pretraining, that is just train AST on AudioSet (based on ImageNet initialization) with BCE loss for classification task. You can then take the model for other audio tasks (e.g., for ESC-50).