LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.43k stars 138 forks source link

Keyword-to-Caption Augmentation #51

Closed usuyama closed 2 years ago

usuyama commented 2 years ago

How do you use T5 for Keyword-to-Caption Augmentation?

I'm checking Section 3.5, but wondering what are the actual prompts for T5:

Screenshot 2022-11-16 at 12 15 00 AM
RetroCirce commented 2 years ago

Hi, thank you for your question. You can see some demos of the keyword-to-caption augmentation in the page 9 of the paper (https://arxiv.org/pdf/2211.06687.pdf) or from this online appendix (https://retrocirce.github.io/appendix/)

Generally, we use the tags of the audio track and use it to make a sentence (as keyword-to-caption). The sentence may not 100% correctly consistent to the audio track event, but it somewhat leads to better performance since it enriches the diversity of the language embeddings.