THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B
Apache License 2.0
1.93k stars 121 forks source link

referring expression comprehension support for CogVLM2 #130

Open zhwuwuwu opened 2 months ago

zhwuwuwu commented 2 months ago

Feature request / 功能建议

Hi, I've found the support of the downstream task, referring expression comprehension (REC) and the model in HF, THUDM/cogvlm-grounding-generalist-hf Want to ask if there's also the same support in CoGVLM2, thx. 更进一步,可以讲解一下chat model和生成bounding box的区别吗,模型结构有区别还是仅仅是sft的loss不一样?感谢

Motivation / 动机

Curious about the ability of CogVLM2 on downstream tasks.

Your contribution / 您的贡献

/

zRzRzRzRzRzRzR commented 2 months ago

训练的目标不同,cogvlm2 目前没有推出grounding版本