CeeZh / LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
MIT License
81 stars 4 forks source link

Inquiry about resources and processng time #6

Open surajk222 opened 2 months ago

surajk222 commented 2 months ago

Hi, I am currently working with the NExT-QA dataset and I ran your code using the model meta-llama/Meta-Llama-3-8B, as GPT-3.5 and GPT-4 are not open source. Could you please provide details on the resources you used to achieve the results with the NExT-QA dataset? Additionally, how long did it take for 1000 annotations to be processed?

CeeZh commented 1 month ago

Hi,

I used 4 A6000 GPUs and it takes 1-2h to run llama-3-8B on Next-qa val set (~5k examples). It takes 3-6h to run llama-3-70B depending on the length of your captions.

I have run LLava-1.5 captions + llama-3-70B on nextqa before. The results are Causal (63.1), Temporal (56.3), Descriptive (70.0), All (62.0). Hope this information help you.