fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
233 stars 44 forks source link

Config detail request #17

Closed SteveTanggithub closed 1 year ago

SteveTanggithub commented 1 year ago

could u provide the all config detail for reproducing the paper results? For example, the model with pretrained and without pretrained version. By the way, must i use the audio with resample rate 32k?

fschmid56 commented 1 year ago

The paper config is indeed the default config used in ex_audioset.py.

Train an ImageNet pre-trained model on AudioSet with the paper config:

python ex_audioset.py --cuda --train --pretrained_name=mn10_im_pytorch

Train a model on AudioSet with the paper config from scratch:

python ex_audioset.py --cuda --train 

Technically it is possible to use audio with a different sampling rate. The code in this repo allows to use 16k and 8k. However, I can't make statements about the expected performance, as I only trained models with 32k sampling rate.