cfoster0 / CLAP

Contrastive Language-Audio Pretraining
BSD 3-Clause "New" or "Revised" License
87 stars 4 forks source link

Evaluations #19

Open cfoster0 opened 3 years ago

cfoster0 commented 3 years ago

We can start thinking about how to evaluate our models once trained. The simplest would be contrastive loss over some held out set with a fixed batch size. Another would be to evaluate how well CLAP score correlates with MOS (Mean Opinion Score), which is the gold standard subjective eval in the audio NN literature. We could probably also try linear probe training on the Google Speech Commands dataset.

cfoster0 commented 3 years ago

Speak of the devil! Here's a new big benchmark that does just what I'm after.

https://arxiv.org/abs/2105.01051