Open orrzohar opened 7 months ago
We were confused about this as well, however I repeated the experiment twice and was able to get over 67 correct. Nonetheless, there are still people who claim to be unable to reproduce the performance of MSVD, e.g., https://github.com/PKU-YuanGroup/Video-LLaVA/issues/36#issue-2031834153. However, we have also observed that some people are able to reproduce the same results as we did, e.g., https://github.com/PKU-YuanGroup/Video-LLaVA/issues/37#issue-2032217679, https://github.com/PKU-YuanGroup/Video-LLaVA/issues/36#issuecomment-1926301528. I think that this may be due to inconsistent results due to version migration of GPT.
Also I have observed similar problems in other work. https://github.com/mbzuai-oryx/Video-ChatGPT/issues/28
Maybe we should find some more stable non-GPT evaluation method.
I mean, an easy variation would be to use Vicuna as the weights are open-source it would be more comparable...
At the very least, it would make sense to set the temperature to 0, as at least the generated text would have less randomness. I am not sure what do to about the version migration; seems like an issue if every time chatGPT outputs a new migration, all the numbers need to be updated for reproducibility.
By the way; when I evaluate TGIF: Yes count: 9249 No count: 16502 Accuracy: 0.3591705176498 Average score: 2.5519785639392647
What's your GPT version? We use gpt-3.5-turbo
.
I didn't change your eval files; the default you use is GPT3.5:
The reason I think you may have uploaded the wrong model to transformers is that I get the following (top row is the model you released; bottom row is a model I pretrained myself with similar data and instruction tuning hyperparameters):
The reason I think you may have uploaded the wrong model to transformers is that I get the following (top row is the model you released; bottom row is a model I pretrained myself with similar data and instruction tuning hyperparameters):
I will check it.
When I evaluate the model you released, I get the following:
All I did was:
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/run_qa_msvd.sh
bash scripts/v1_5/eval/eval_qa_msvd.sh