[Feature] Internvit question

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

https://internvl.readthedocs.io/en/latest/

MIT License

6.02k stars 467 forks source link

[Feature] Internvit question #582

Closed AshOneN closed 1 month ago

AshOneN commented 2 months ago

Motivation

1.从internvit224 到 internvit448 是重新做了对比学习的预训练吗？如果只是微调，是怎么将224的分辨率微调到448的。 2.internvit的输出维度和llm的输出维度不一致，是怎么做对比学习算相似度的。

Related resources

No response

Additional context

No response

czczup commented 2 months ago

您好，224升到448我们没有重新做对比学习，是通过MLLM的Pretrain阶段打开ViT训到448分辨率的。另外在InternVL 1.0的框架里，我们将InternViT-6B和LLaMA-7B通过对比学习对齐。这里两个模型的输出embedding都会通过linear或者attention pooling的方式，降维到768，因此可以算对比学习的相似度。