哪个模型能够实现最好的image caption 任务？

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

2.06k stars 128 forks source link

Closed shams2023 closed 3 months ago

shams2023 commented 4 months ago

我目前想使用您们的模型来为我自己搜集的图像进行文本生成任务，但是不知道哪个模型可以很好的实现低分辨率图像的描述（图像分辨率普遍不高）（因为是在傍晚搜集得到的）（想获得图像中行人的详细描述）？因此特意来此询问你们，希望得到你们的帮助！祝你们事业顺利，龙年大吉！

yuhangzang commented 4 months ago

You may try the vl-7b model and I guess the prompt engineering attempts are needed.

shams2023 commented 4 months ago

您可以尝试vl-7b模型，我想需要立即进行工程尝试。

直接这样使用吗？