cur_input_embeds = torch.cat([cur_input_embeds_1, cur_image_features[0:0], cur_input_embeds_2], dim=0),其中cur_image_features[0:0]表示这是一个没有维度的向量，图像的特征并没有真正加进去

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

MIT License

2.25k stars 171 forks source link

Open hangzeli05 opened 9 months ago

hangzeli05 commented 9 months ago

mPLUG-Owl2中的代码错误

vateye commented 9 months ago

No, it is for compatible with deepspeed zero3 during training on text samples. For multi-modal input, this would not encounter.