Would you mind adding 2 papers about video-text retrieval.
Paper 1: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Accepted at ECCV 2024.
It leverages LLaVA to increase the scale of training data to video-text retrieval. The approach is to forward the concatenated frames of a video to LLaVA to generate the caption for the video.
Hi,
Thank you for your wonderful survey!
Would you mind adding 2 papers about video-text retrieval.
Paper 1: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Accepted at ECCV 2024.
It leverages LLaVA to increase the scale of training data to video-text retrieval. The approach is to forward the concatenated frames of a video to LLaVA to generate the caption for the video.
Paper link: https://arxiv.org/abs/2407.03788
Code link: https://github.com/nguyentthong/meta_optimized_angular_margin_contrastive_lvlm
Paper 2: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Accepted at ACL 2024 as Findings.
This paper summarizes video-text retrieval methods from model architecture, model training, and data perspectives.
Paper link: https://arxiv.org/abs/2406.05615
Code link: https://github.com/nguyentthong/video-language-understanding
Thanks a lot!