dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
742 stars 44 forks source link

Long Video dataset #61

Open eslambakr opened 9 months ago

eslambakr commented 9 months ago

Dears,

Thank you for sharing your great work!

Please, I have some question regarding the Long Video dataset:

  1. Can you share the scripts for generating the questions? In other words, the inference scripts with the detailed prompts used with GPT4 and Claude-2.
  2. It would be highly appreciated if you can share the category label for each question, i.e., video summary, movie plot, and detail reasoning.

Thanks in advance! Best regards.

wcy1122 commented 8 months ago

Hi. Thanks for your interest in our work. Sorry that since I am very busy recently, the GPT4 data generation script and category label for each question will be released later. If you want to generate some data, you can refer to the prompt provided in Figure 7 in the appendix of our paper.