X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

Inquiry about the performance difference between mPLUG-owl and other models #59

Closed pyogher closed 1 year ago

pyogher commented 1 year ago

Hi there,

I've been using mPLUG-owl and noticed a significant difference in inference speed compared to other models such as Otter and multimodal-GPT. It also outperforms Vicuna and LLaMA in terms of speed. I'm curious to know the reason behind this performance gap.

Could you kindly shed some light on the factors contributing to the observed speed advantage of mPLUG-owl over these models? I'm curious to know what factors or optimizations contribute to its improved performance. :)

MAGAer13 commented 1 year ago

Hi, we use a relatively smaller visual backbone (i.e. ViT-L) However, others methods like miniGPT-4 utilize larger ViT (ViT-G). SInce our method directly input the image into the LLM, it would be faster than Otter which utilizes cross-attention to fuse the image.

Besides, comparing with LLaMA and Vicuna, we use the fp16 and 8-bit inference for better performance.

pyogher commented 1 year ago

Hi,

Thank you for addressing my query and providing these insights. I greatly appreciate your assistance. :)