Closed joewale closed 3 years ago
Hi there,
To use the ImageNet pretrained model for audio/speech tasks, you just need to set imagenet_pretrain=True when you initialize the AST model, the timm package will automatically download it for you, and my code will adapt it to the audio/speech task (see Section 2.2 of the paper), you don't need to explicitly use the URL.
If you simply want to know the URL of the ImageNet pretrained model, for the base DeiT model we used in the paper, the URL is https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_384-d0272ac0.pth .
-Yuan
Btw, all AudioSet pretrained model are (Imagenet + AudioSet) pretrained model.
Hi, YuanGongND, thanks to your quick reply. Because the network is unreachable in my machine, I want to download the imagenet pretrained model with base384 model size.
I see, you can download the model use the link I provided and put it in your $TORCH_HOME/hub/checkpoints/deit_base_distilled_patch16_384-d0272ac0.pth. Then when you set imagenet_pretrain=True when you initialize the AST model, the timm package should skip the download process and directly load the model locally.
got it, thanks ! I run the code with my dataset, and the log when loading the pretrained model as follows, is it right ?
It is correct, and you are using AudioSet pretrained model (which is actually AudioSet+Imagenet pretrained model). I do recommend using this model for all tasks EXCEPT that your dataset is AudioSet itself.
The reason why you see two 'AST Model Summary' two times is that internally the code initializes a model without any pretraining and then loads the AudioSet pretrained model. So it is the expected behavior.
got it , thanks a lot. Is there the code or the demo to test the single audiofile with the trained model ?
There's no such demo yet, but I will add on when I have some time.
ok, It's great! I will have a try. I'm looking forward to your demo. Thanks a lot.
Hi, YuanGongND, can you share the imagenet pretrain model url ?