CeeZh / LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
MIT License
81 stars 4 forks source link

Performance of LLoVi with 7B llama2 #3

Open Leo-Yuyang opened 5 months ago

Leo-Yuyang commented 5 months ago

Dear author, thank you for your work! I would like to know the performance of LLoVi on next-qa, next-gqa and IntentQA, when using 7b llama2 as the LLM. For larger model like gpt3.5 and gpt4, they are not open-source so we cannot do research on how to improve them on this task. So I think it's beneficial for the community to report the performance on smaller llm

CeeZh commented 5 months ago

Hi Leo,

Thanks for reaching out. We did not test llama-7b on nextqa. However, it should be straightforward to modify this codebase to support it. You can refer to the setting of (EgoSchema + LLama-70B) and (NextQA + GPT-4).

Leo-Yuyang commented 4 months ago

Dear author, thank you for your reply! Yeah I just tried to test llama-7b on nextqa. And yeah it's very much straightforward. python main.py \ --dataset nextqa \ --data_path data/nextqa/llava1.5_fps1.json \ --fps 0.5 \ --anno_path data/nextqa/val.csv \ --duration_path data/nextqa/durations.json \ --prompt_type qa_next \ --model gpt-4-1106-preview \ --output_base_path output/nextqa \ --output_filename gpt4_llava.json I just downloaded the data.zip, and modified the --model part of the above command to llama-2-7b-chat. However, I don't have the permission to access llama2 series models on huggingface so I can't run the code successfully. Then I tried to download the model from https://github.com/meta-llama/llama. However, the way of loading the model is different from the way it is loaded on huggingface. So I modified a lot. At the end of the day, I only get an acc of 2.02% on nextqa task. I think this is not reasonable and maybe due to the way I load the model. So I'd like to know weather it's convenient for you to help me reproduce it and get the performance. Looks like that when the environment is ready, all one needs to do is modifying the --model part of the above command to 'llama-2-7b-chat'.