Open GenjiB opened 1 year ago
Hi, the 45% you got is from the zero-shot classification performance, similar to the 46.2% reported in our paper. The 75% is got from supervised fine-tuning the audio encoder. In such a case, you need to finetune on VGGSound in a supervised manner.
Best, Yusong
Excuse me! I have the same trouble with the re-producing results of VGGSound on the zero-shot setting. I can just get the top-1 accuracy of about 28.4% by running the pre-trained model 630k-audioset-best.pt
with the vanilla VGGSound list test.csv.
Zeroshot Classification Results: mean_rank: 25.6106 median_rank: 4.0000 R@1: 0.2841 R@5: 0.5516 R@10: 0.6609 mAP@10: 0.3979
I am wondering if you will provide the script or other useful resources for reproducing results on VGGSound.
Looking forward to your reply.
Hello! I have the same problems on reproducing the results of zeroshot-classification of VGGSound dataset. With the checkpoint of 630k-audioset-best.pt
, I got 29.83% top-1 accuracy on the test set of VGGSound.
Can you give some instructions for reproducing the results on VGGSound?
Many thanks to your reply in advance!
Thanks for sharing the amazing codebase. I am wondering if you will provide the script or other useful resources for reproducing results on VGGSound.
I tried to use the
get_audio_embedding_from_filelist
to get audio features. But I can only get ~45%, which is a huge gap between 75% (really impressive since A+V only gets 64.1%)Looking forward to your reply.