YuanGongND / psla

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
BSD 3-Clause "New" or "Revised" License
139 stars 16 forks source link

Pretrained models #7

Closed abaronetto closed 2 years ago

abaronetto commented 2 years ago

Hello Yuan, great work and thank you for making it available for other researchers. I am currently testing deep learning models on my audio dataset to see which model performs better. I saw you made available the pretrained EfficientNet B2 models with 4-headed attention. I was wondering if it would be possible to download other pretrained models too, e.g. EfficientNet B2 with Mean Pooling. Thank you in advance,

Annalisa

YuanGongND commented 2 years ago

Hi Annalisa,

Thanks for your interest.

This is the one (without weight averaging) I have on my server. Note it is a model created by my experiment code, I did a code cleanup before I release the code, so it might not fit the released code. You could have a try.

In one of my new projects, I have a new implementation of the mean pooling model using the torchvision implementation. You can play with it at https://colab.research.google.com/github/YuanGongND/vocalsound/blob/main/colab/VocalSound.ipynb. Nevertheless, the model is not pretrained with AudioSet.

-Yuan