DLYuanGod / TinyGPT-V

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
BSD 3-Clause "New" or "Revised" License
1.23k stars 75 forks source link

REC Results #16

Open tydia opened 8 months ago

tydia commented 8 months ago

The paper mentioned referring expression comprehension (REC) - a vital task that measures the language-driven grounding ability of a visual-language multimodal model. RefCOCO/+/g are also used for training in Stage 4 as mentioned in paper. However, the reported experiments does not have the RefCOCO's results even though Table 2 states it can do grounding task. Will these test results be updated? A comparison between TinyGPT-V and its counterpart Shrika would be very useful for a more comprehensive evaluation of the mentioned method.

DLYuanGod commented 8 months ago

Hello. Yes as you said, we will update the RefCOCO results in the next version.We're already working on it. Thank you for your continued interest!