在预训练和fine-tune Video-LLaMA时，使用的数据是否包含一些专门用于Video Captioning任务的数据集呢？比如MSVD, MSR-VTT, VATEX.

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

BSD 3-Clause "New" or "Revised" License

2.72k stars 242 forks source link

Closed tiesanguaixia closed 1 year ago

tiesanguaixia commented 1 year ago

感谢您的回复！

hangzhang-nlp commented 1 year ago

没有。