dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
693 stars 43 forks source link

Cannot reproduce VideoChatGPT generative performance results #101

Open xsgldhy opened 2 months ago

xsgldhy commented 2 months ago

Thank you for your contribution!

Hello, I'm trying to reproduce the evaluation scores for generative performance in the VideoChatGPT evaluation of model EVA-G & LLaVA1.5-VideoChatGPT-Instruct 7B. I have downloaded your codebase; aside from adjusting the video path, I have also changed the name of your LlavaConfig from "llava" to "llama_vid," because the former causes a conflict with my transformers package (version 4.41.2). All the other parts remain the same. image image image

Reproduced result is shown below: image

EchoDreamer commented 1 month ago

Thank you for your contribution!

Hello, I'm trying to reproduce the evaluation scores for generative performance in the VideoChatGPT evaluation of model EVA-G & LLaVA1.5-VideoChatGPT-Instruct 7B. I have downloaded your codebase; aside from adjusting the video path, I have also changed the name of your LlavaConfig from "llava" to "llama_vid," because the former causes a conflict with my transformers package (version 4.41.2). All the other parts remain the same. image image image

Reproduced result is shown below: image

Hi, I’m currently trying to reproduce the results from the LlamaVid paper, but I’m having some difficulty because I don’t have access to the WebVid dataset. Would you be able to guide me on how to download or access the WebVid dataset? I’d really appreciate any help you could offer. Thank you so much in advance!