LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.36k stars 133 forks source link

Question about training set of CLAP #145

Open sjhan91 opened 5 months ago

sjhan91 commented 5 months ago

Hello, I wonder which dataset you used to train CLAP (especially for music).

The reason I ask is audio embeddings from audio syntheized from MIDI is not closely aligned with text embeddings (MusicCaps, AudioStock, LP-MusicCaps, AudioSet). (when I draw samples in t-SNE space)

Also, GTZAN embeddings show similar situation.

Or, can you let me know the example of captions?

Regards.