Closed gd2016229035 closed 1 year ago
To reproduce the results in or paper, you can use the weights after stage 3 or after the stage 2. Online demo version is tuned for better interaction on many grounding tasks
On Wed, Oct 25, 2023 at 1:38 AM Guan Dai @.***> wrote:
Nice work! I test minigptv2 using demo dev weights, but get only ~50 Acc in okvqa benchmark. some questions about reproduce results: (1)What differences between minigptv2 weights of "online developing demo" and "after stage-3"? (2)To reproduce the results of a paper, what parameters should be used, for example, the 'temperature'? (3)Will setting "low_resource" to True in the config affect the reproducibility of the results in the paper? thank you
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/Vision-CAIR/MiniGPT-4/issues/400__;!!Nmw4Hv0!0yU4r5G8W8r5nXLQrna9ZPAJu0Rgq7CIz-dYr33GaA0lcbT0C4SpMQJooRt6dzjaUpP8iK4ErRUpt_ZyZ5H80BNnRr3y$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AETSJXBC5TTGHWTWJOUHCXTYBDFYNAVCNFSM6AAAAAA6O7F7RKVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DAOBVHA3TQNA__;!!Nmw4Hv0!0yU4r5G8W8r5nXLQrna9ZPAJu0Rgq7CIz-dYr33GaA0lcbT0C4SpMQJooRt6dzjaUpP8iK4ErRUpt_ZyZ5H80AGpPlCw$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
which prompt did you use?"[vqa] question" (in EXPERIMENTS )or "[vqa] Based on the image, respond to this question with a single word or phrase: question" (in APPENDIX). I try both of these prompts on okvqa dataset, and use the "after stage 3" checkpoint , fp16 and greedy decoding method, but still get ~52 Acc. please give me some advice about reproducibility, thank you!
We will release our evaluation code in the next few days. So you can use our code to reproduce the results
On Thu, Oct 26, 2023 at 8:48 PM Guan Dai @.***> wrote:
which prompt did you use?"[vqa] question" (in EXPERIMENTS )or "[vqa] Based on the image, respond to this question with a single word or phrase: question" (in APPENDIX). I try both of these prompts on okvqa dataset, and use the "after stage 3" checkpoint , fp16 and greedy decoding method, but still get ~52 Acc. please give me some advice about reproducibility, thank you!
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/Vision-CAIR/MiniGPT-4/issues/400*issuecomment-1782245003__;Iw!!Nmw4Hv0!xEKitZW-GMHSSoCXIr7wphHrr8LHhLo7ikuACYEb8jrTPXPH8Ek9fA5KX8jQfK7xk-8KNo1hYb5xd4HHPeA_211grZyF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AETSJXCFSRDUGK5KKQYILS3YBMVIHAVCNFSM6AAAAAA6O7F7RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSGI2DKMBQGM__;!!Nmw4Hv0!xEKitZW-GMHSSoCXIr7wphHrr8LHhLo7ikuACYEb8jrTPXPH8Ek9fA5KX8jQfK7xk-8KNo1hYb5xd4HHPeA_2z0BJbJZ$ . You are receiving this because you modified the open/close state.Message ID: @.***>
--
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
I have found the root cause of the issue. It is a bug that occurs during "multi-batch inference", leading to inaccurate results (possibly related to the pad_token? for which I haven't found a solution yet). When I tested with batch=1 on okvqa and refcoco dataset, I basically achieved the results mentioned in the paper! I will study your code further once you release the evaluation code. Thank you for the advice.
Any update on when the evaluation code will be released?
We have updated the evaluation code.
Thank you very much for the evaluation code. However, the code is for evaluate MiniGPT-v2, is there any code for evaluating MiniGPT4?
I have found the root cause of the issue. It is a bug that occurs during "multi-batch inference", leading to inaccurate results (possibly related to the pad_token? for which I haven't found a solution yet). When I tested with batch=1 on okvqa and refcoco dataset, I basically achieved the results mentioned in the paper! I will study your code further once you release the evaluation code. Thank you for the advice.
I have tried to evalute Refcoco by following refcoco. And my configs are ckpt: minigptv2_checkpoint.pth llama_model: hf_llama2_7b_chat vit_g: eva_vit_g.pth. I download vit_g from https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth. But the results is only about 38-40% acc. Are there any errors in my configs or some other things?
Nice work! I test minigptv2 using demo dev weights, but get only ~50 Acc in okvqa benchmark. some questions about reproduce results: (1)What differences between minigptv2 weights of "online developing demo" and "after stage-3"? (2)To reproduce the results of a paper, what parameters should be used, for example, the 'temperature'? (3)Will setting "low_resource" to True in the config affect the reproducibility of the results in the paper? thank you