Closed saghiralfasly closed 3 years ago
Hi there,
You are very correct on this point - we use the same solution for our later models. You can safely eliminate anything in self.v.head
(but not self.mlp_head
).
Finetuning the AudioSet pretrained model for new tasks is totally possible and we recommend trying it for any audio/speech task, please read the readme introduction on how to do this.
-Yuan
Eliminating the ImageNet classifier can make the model slightly smaller, but performance-wise it is the same.
Hi Yuan Gong, Thank you for sharing your work. It is clear and easy to run. I am wondering about the ImageNet Classifier weights, they still exist in AudioSet pretrained models. do you train them?. here is the last displayed part of the pretrained "audioset_10_10_0.4593.pth"
module.v.head.weight torch.Size([1000, 768]) module.v.head.bias torch.Size([1000]) module.v.head_dist.weight torch.Size([1000, 768]) module.v.head_dist.bias torch.Size([1000]) module.mlp_head.0.weight torch.Size([768]) module.mlp_head.0.bias torch.Size([768]) module.mlp_head.1.weight torch.Size([527, 768]) module.mlp_head.1.bias torch.Size([527])
They can be skipped by self.v.head = nn.Identity() self.v.head_dist = nn.Identity()
Now, I want to use the pretrained Audioset model for another task. but worried if I eliminate this part will affect the performance. Although, I think they are not connected to the final Audioset classifier of 527 classes.
Thank you again