LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.36k stars 133 forks source link

How to evaluate 630k-audioset-fusion-best.pt or 630k-audioset-best.pt on AudioSet? #143

Closed Franklin905 closed 6 months ago

Franklin905 commented 7 months ago

Are the models '630k-audioset-fusion-best.pt' and '630k-audioset-best.pt' trained and evaluated on AudioSet? If so, how are they trained or evaluated on AudioSet? Because videos in AudioSet contain multiple labels, I'm unsure how to calculate contrastive loss on AudioSet videos to train CLAP.

lukewys commented 6 months ago

Hi,

They are trained on dataset that contains Audioset but we report the metrics on Audiocaps and Clotho. To train Audioset with multiple labels, in here we simply treat each label-audio pair as positive pair. There are better ways to treat multiple-label problem. See for example https://arxiv.org/abs/2204.03610

Franklin905 commented 6 months ago

Got it! Thanks for your reply.