Keyword-to-Caption Augmentation

Hi, thank you for your question. You can see some demos of the keyword-to-caption augmentation in the page 9 of the paper (https://arxiv.org/pdf/2211.06687.pdf) or from this online appendix (https://retrocirce.github.io/appendix/)

Generally, we use the tags of the audio track and use it to make a sentence (as keyword-to-caption). The sentence may not 100% correctly consistent to the audio track event, but it somewhat leads to better performance since it enriches the diversity of the language embeddings.

LAION-AI / CLAP

Keyword-to-Caption Augmentation #51