X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.32k stars 176 forks source link

maybe bug in processing_mplugowl3.py ? #231

Closed hpy-42 closed 2 months ago

hpy-42 commented 3 months ago

https://huggingface.co/mPLUG/mPLUG-Owl3-7B-240728/blob/main/processing_mplugowl3.py#L232

When self.image_processor.add_global set to True, i think image_token_ptr should += 1 one more time during the loop...


for next_text in text_list[1:]:
    text += self.image_processor.cut_prompt_template(img_token='<|image|>', h=cut_shape[image_token_ptr][0], w=cut_shape[image_token_ptr][1])
    text += next_text
    image_token_ptr += 1
    ### ptr to next image
    if self.image_processor.add_global:
        image_token_ptr += 1
message['content'] = text
LukeForeverYoung commented 2 months ago

Thank you for pointing this issue. It is a bug and we will fix it soon. In our demo and evaluation, we turn off image cut, so it temporarily does not affect the performance of the model in most scenarios.

LukeForeverYoung commented 2 months ago

We fixed this issue and updated the code in Hugging Face and ModelScope. However, we found that because we never trained the model with multi-image input when enabling the image cut, the performance is suboptimal. We will improve this weakness in the next model release.