OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
419 stars 30 forks source link

Some Problems with VPGTrans #2

Closed waxnkw closed 1 year ago

waxnkw commented 1 year ago

I am the first author of VPGTrans. Thanks so much for using VPGTrans! I try to see the excellent work from WeChat Articles. However, it seems to be some problems with the VPGTrans.

  1. I try your example in the WeChat Article. My demo (https://vpgtrans.github.io/) shows that: Selection_412 But the result in the WeChat article is : Selection_413

It is different. I am not sure whether there are some modifications with the default hyperparameters like the prompt format or the beam size. I will also try to check the code. If any findings, I will also report them here.

For your debug use, you can compare it with our demo (https://vpgtrans.github.io/). If the demo is down, just mail me (zhanga6@outlook.com).

  1. The main authors are from NUS. But the main institution in the WeChat Article is Tsinghua University. If it is possible, hope you can modify it to NUS&THU. If it is inconvenient, hope you can add a comment at the bottom of the WeChat Article or at least correct it in this repo (model.jpg).
waxnkw commented 1 year ago

Bug found. This was caused by the misuse of MiniGPT4's conversation function in VPGTrans. The bugs have already been corrected! Thanks so much for the developers!