Some Problems with VPGTrans

I am the first author of VPGTrans. Thanks so much for using VPGTrans! I try to see the excellent work from WeChat Articles. However, it seems to be some problems with the VPGTrans.

I try your example in the WeChat Article. My demo (https://vpgtrans.github.io/) shows that: But the result in the WeChat article is :

It is different. I am not sure whether there are some modifications with the default hyperparameters like the prompt format or the beam size. I will also try to check the code. If any findings, I will also report them here.

For your debug use, you can compare it with our demo (https://vpgtrans.github.io/). If the demo is down, just mail me (zhanga6@outlook.com).

The main authors are from NUS. But the main institution in the WeChat Article is Tsinghua University. If it is possible, hope you can modify it to NUS&THU. If it is inconvenient, hope you can add a comment at the bottom of the WeChat Article or at least correct it in this repo (model.jpg).

OpenGVLab / Multi-Modality-Arena

Some Problems with VPGTrans #2