InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
1.92k stars 121 forks source link

Special tokens and Newline Token #257

Closed KooSung closed 2 months ago

KooSung commented 2 months ago

internlm/internlm-xcomposer2-4khd-7b is an excellent work, I have a few questions about it.

  1. Special tokens: Why does modeling_internlm_xcomposer2.py use [UNUSED_TOKEN_146]and [UNUSED_TOKEN_145], while these two are not in the special tokens? Instead, <|im_start|>and <|im_end|>.
  2. Image 2D Structure Newline Indicator It seems that the 'Image 2D Structure Newline Indicator' (\n) token and the separate token mentioned in the paper are not seen in the code modeling_internlm_xcomposer2.py.
  3. LLM InternLM-XComposer2 VL uses InternLM2-7B-Chat-SFT as LLM. What is the reason for choosing this model? Have you conducted experiments on InternLM2-Chat-7B?
LightDXY commented 2 months ago

Hi, thanks for your interest in our work.

  1. we map <|im_start|> and <|im_end|> to [UNUSED_TOKEN_146] and [UNUSED_TOKEN_145] predefined in the vocabulary. Both format works equally in practice
  2. separate is plora_glb_GN and \n is plora_sub_GN in the code, we will clarify its name in the following update.
  3. The previous XComposer2 used the InternLM2-7B-Chat-SFT as the backbone, as the PPO version (-Chat) was not ready at that time, so we kept the backbone unchanged for the 4KHD version.
KooSung commented 2 months ago

@LightDXY Thanks for your reply.