X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

QuickStart Code for mplug_owl2.1 has lots of errors. #210

Closed Carol-gutianle closed 6 months ago

Carol-gutianle commented 6 months ago

I downloaded the weights from the official link showed in README.md. However, I encountered a lot of errors when I tried to use quickstart code to inference.

  1. Trying to set a tensor of shape torch.Size([151851, 4096]) in "weight" (which has shape torch.Size([151936, 4096])), this look incorrect. I guess the vocab size for the model weights in the repo is 151851. However, the setting in the config.json shows that it is 151936. So I modified the vocab_size in config.json and pass the first error.
  2. RuntimeError: expected scalar type Half but found BFloat16 The datatype showed in config.json is torch.bfloat16, but quickstart codes uses torch.float16. So I pass this error by: image
  3. Error: index 1 is out of bounds for dimension 0 with size 1 It is so strange that I can run the first sample(vqa) successfully but fail to run the second sample, and I get this error. I haven't solved it.
junyangwang0410 commented 6 months ago

If you are using accelerate version >0.21.0, try downgrading by: pip install accelerate==0.21.0

Carol-gutianle commented 6 months ago

If you are using accelerate version >0.21.0, try downgrading by: pip install accelerate==0.21.0

Thank you for your reply and the second problem can be solved, but the third one is still a problem.

Carol-gutianle commented 6 months ago

I find the solutions to the third question. It's my fault that I forgot to reset the conv before the next sample, so the |IMAGE| of the last sample will be added to next sampele.

pengchaosupper commented 3 days ago

I find the solutions to the third question. It's my fault that I forgot to reset the before the next sample, so the |IMAGE| of the last sample will be added to next sampele.conv

Hello, can you give a detailed answer to the third question