dvlab-research / LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Apache License 2.0
622 stars 39 forks source link

MSVD ACC decrease after stage3 #58

Closed Deaddawn closed 4 months ago

Deaddawn commented 5 months ago

Hi, there. I have experimented on the MSVD QA using llama-vid-7b-full-224-video-fps-1 and llama-vid-7b-full-224-long-video. The latter acc decrease, does that make sense to you?

Deaddawn commented 5 months ago

decrease about 0.2

yanwei-li commented 5 months ago

Hi, this could happen when tuning long videos. And performance of the MSVD QA could also have variant results because of the GPT evaluation.

Deaddawn commented 5 months ago

Hi, this could happen when tuning long videos. And performance of the MSVD QA could also have variant results because of the GPT evaluation.

got it, tks