InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
1.92k stars 121 forks source link

IXC2-VL的训练策略 #275

Closed wjjweb1 closed 2 months ago

wjjweb1 commented 2 months ago

您好,对你们的工作很感兴趣~我正在阅读你们的工作《InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD》,十分优秀的工作!但是我对文中IXC2-VL的训练策略感到好奇,该模型的视觉编码器在pretrain阶段是否解冻?训练策略是否和本文的IXC2-4KHD一样?诚挚期待您的回复!

LightDXY commented 2 months ago

vit在pretrian阶段是放开的,训练策略和4KHD基本一样,细节请见https://arxiv.org/abs/2401.16420

wjjweb1 commented 2 months ago

好的,谢谢您的回复!