MVBench测试结果不一致

TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Apache License 2.0

80 stars 2 forks source link

MVBench测试结果不一致 #3

Closed emmating12 closed 2 months ago

emmating12 commented 3 months ago

您好，我在本地配置的EVA ViT-g+InstructBLIP+Vicuna1.1模型，在MVBench上测试结果为35.45%，具体测试结果见附件，能帮忙看下是哪里出的问题吗？ instructblipbase_stllm_qa_mvbench_fps1.json

farewellthree commented 2 months ago

是给的权重吗，另一个issue的老哥用提供的ckpt测是正常的，训练时的ckpt出了点问题

emmating12 commented 2 months ago

是给的权重吗，另一个issue的老哥用提供的ckpt测是正常的，训练时的ckpt出了点问题

https://github.com/TencentARC/ST-LLM/blob/main/trainval.md 您好我是按照这个页面修改的

emmating12 commented 2 months ago

model: arch: st_llm_hf model_type: instructblip_vicuna0_btadapter use_grad_checkpoint: True max_txt_len: 256 end_sym: "###" video_input: "all" llama_model: '/root/qfs1/weights/vicuna-7b-v1.1' ckpt: '/root/qfs1/weights/ST-LLM/instruct_blip_vicuna7b_trimmed.pth' q_former_model: '/root/qfs1/weights/ST-LLM/instruct_blip_vicuna7b_trimmed.pth' qformer_text_input: True freeze_LLM: False use_mask : True mvm_decode: True

farewellthree commented 2 months ago

35.45%似乎是instructblip baseline的效果。QA的权重load正确吗？可以在stllm/models/st_llm.py 201行打个断点看看。

emmating12 commented 2 months ago

35.45%似乎是instructblip baseline的效果。QA的权重load正确吗？可以在stllm/models/st_llm.py 201行打个断点看看。

您好 QA的权重是不是下载这个呢 https://huggingface.co/farewellthree/ST_LLM_weight/tree/main/QA_weight