can not fully reproduce the test results of the open source weight

lsnls commented 4 months ago

Hello, thanks for your outstanding work!

I tested the open source weight: wisdomik/Quilt-Llava-v1.5-7b. Based on my test results, I guess the weight is trained with LLaVA chckpoint, 7B Language Model and stage 1 trains 0 epoch and stage 2 trains 3 epochs. Unfortunately, there is a test metric that is quite different from what you documented in your paper, and that is the test results on the closed set of Quilt-VQA w/ red circle. My test result was 71.3 and you recorded 77.78.

I am looking forward to your reply! Trank you a milion!

Lewislou commented 2 months ago

Hello, thanks for your outstanding work!

I tested the open source weight: wisdomik/Quilt-Llava-v1.5-7b. Based on my test results, I guess the weight is trained with LLaVA chckpoint, 7B Language Model and stage 1 trains 0 epoch and stage 2 trains 3 epochs. Unfortunately, there is a test metric that is quite different from what you documented in your paper, and that is the test results on the closed set of Quilt-VQA w/ red circle. My test result was 71.3 and you recorded 77.78.

I am looking forward to your reply! Trank you a milion!

Hi,

How did you evaluate the model? In quilt_eval.py, where is the 'answer-file-llava-zeorshot.jsonl'? If I set --anchor as None, I only get 'yes/no accuracy = 62.9738'.

hwei-hw commented 2 months ago

I also encounter the same issue. Do you have any solution now?

HLSvois commented 3 weeks ago

Add a prompt “Please choose from the following two options: [Yes, No]” may help

aldraus / quilt-llava

can not fully reproduce the test results of the open source weight #13