Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
I am the first author of VPGTrans. Thanks so much for using VPGTrans! I try to see the excellent work from WeChat Articles. However, it seems to be some problems with the VPGTrans.
I try your example in the WeChat Article. My demo (https://vpgtrans.github.io/) shows that:
But the result in the WeChat article is :
It is different. I am not sure whether there are some modifications with the default hyperparameters like the prompt format or the beam size. I will also try to check the code. If any findings, I will also report them here.
For your debug use, you can compare it with our demo (https://vpgtrans.github.io/). If the demo is down, just mail me (zhanga6@outlook.com).
The main authors are from NUS. But the main institution in the WeChat Article is Tsinghua University. If it is possible, hope you can modify it to NUS&THU. If it is inconvenient, hope you can add a comment at the bottom of the WeChat Article or at least correct it in this repo (model.jpg).
Bug found. This was caused by the misuse of MiniGPT4's conversation function in VPGTrans. The bugs have already been corrected! Thanks so much for the developers!
I am the first author of VPGTrans. Thanks so much for using VPGTrans! I try to see the excellent work from WeChat Articles. However, it seems to be some problems with the VPGTrans.
It is different. I am not sure whether there are some modifications with the default hyperparameters like the prompt format or the beam size. I will also try to check the code. If any findings, I will also report them here.
For your debug use, you can compare it with our demo (https://vpgtrans.github.io/). If the demo is down, just mail me (zhanga6@outlook.com).