Closed pyogher closed 1 year ago
Hi, we use a relatively smaller visual backbone (i.e. ViT-L) However, others methods like miniGPT-4 utilize larger ViT (ViT-G). SInce our method directly input the image into the LLM, it would be faster than Otter which utilizes cross-attention to fuse the image.
Besides, comparing with LLaMA and Vicuna, we use the fp16 and 8-bit inference for better performance.
Hi,
Thank you for addressing my query and providing these insights. I greatly appreciate your assistance. :)
Hi there,
I've been using mPLUG-owl and noticed a significant difference in inference speed compared to other models such as Otter and multimodal-GPT. It also outperforms Vicuna and LLaMA in terms of speed. I'm curious to know the reason behind this performance gap.
Could you kindly shed some light on the factors contributing to the observed speed advantage of mPLUG-owl over these models? I'm curious to know what factors or optimizations contribute to its improved performance. :)