fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

Hack to get 1 dimensional output #12

Closed turian closed 10 months ago

turian commented 1 year ago

I'm not sure why this was needed, but I had to add this hack to get num_classes=1 to work:

<             #num_classes = state_dict['classifier.1.bias'].size(0)
<             num_classes = state_dict['classifier.2.bias'].size(0)
---
>             num_classes = state_dict['classifier.1.bias'].size(0)
313,315d299
<             if "classifier.2.weight" in state_dict:
<                 del state_dict['classifier.2.weight']
<                 del state_dict['classifier.2.bias']

I won't push a fix because I don't understand the impact of this on other users. Perhaps it should only be used when num_classes is 1?

fschmid56 commented 1 year ago

For which combination of parameters passed to the _getmodel function does this happen?

Currently, if a pre-trained model is loaded via its name, e.g. _pretrained_name="mn10_asfc" then the correct parameters have to be specified manually (e.g. _head_type="fullyconvolutional"). I will attach the correct config to the model names soon (after an upcoming deadline). This could be related to your problem.