Closed usuyama closed 2 years ago
Hi, thank you for your question. You can see some demos of the keyword-to-caption augmentation in the page 9 of the paper (https://arxiv.org/pdf/2211.06687.pdf) or from this online appendix (https://retrocirce.github.io/appendix/)
Generally, we use the tags of the audio track and use it to make a sentence (as keyword-to-caption). The sentence may not 100% correctly consistent to the audio track event, but it somewhat leads to better performance since it enriches the diversity of the language embeddings.
How do you use T5 for Keyword-to-Caption Augmentation?
I'm checking Section 3.5, but wondering what are the actual prompts for T5: