InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
2.06k stars 127 forks source link

请问InternLM-XComposer2 是否使用了vision_projector #207

Closed Changhao-Xiang closed 3 months ago

Changhao-Xiang commented 4 months ago

您好!感谢InternLM-XComposer相关代码的开源!在模型架构方面,我在学习代码的过程中遇到了一个问题: 在InternLM-XComposer2论文中提到的模型结构只有vision_encoder和LLM两个部分,但build_mlp.py里似乎指定了projector_type为两层mlp,没有使用IdentityMap

def build_vision_projector():
    projector_type = 'mlp2x_gelu'
    mm_hidden_size = 1024
    hidden_size = 4096
    mlp_gelu_match = re.match(r'^mlp(\d+)x_gelu$', projector_type)
    if mlp_gelu_match:
        mlp_depth = int(mlp_gelu_match.group(1))
        modules = [nn.Linear(mm_hidden_size, hidden_size)]
        for _ in range(1, mlp_depth):
            modules.append(nn.GELU())
            modules.append(nn.Linear(hidden_size, hidden_size))
        return nn.Sequential(*modules)
    if projector_type == 'identity':
        return IdentityMap()
    raise ValueError(f'Unknown projector type: {projector_type}')

我可能没有找到在哪里有修改projector_type这个参数,请问具体是在哪里做的修改,或者模型中的确使用了两层mlp作为projector?

Changhao-Xiang commented 3 months ago

@yhcao6 @panzhang0212 @LightDXY 您好,可以解答一下吗?谢谢

LightDXY commented 3 months ago

是的,我们使用了projector,代码在https://huggingface.co/internlm/internlm-xcomposer2-vl-7b/blob/main/modeling_internlm_xcomposer2.py#L68