[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.72k
stars
242
forks
source link
在预训练和fine-tune Video-LLaMA时,使用的数据是否包含一些专门用于Video Captioning任务的数据集呢?比如MSVD, MSR-VTT, VATEX. #93
Closed
tiesanguaixia closed 1 year ago
感谢您的回复!