OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
https://internvl.readthedocs.io/en/latest/
MIT License
5.48k stars 425 forks source link

Real-time performance #511

Closed SutirthaChakraborty closed 1 week ago

SutirthaChakraborty commented 4 weeks ago

Is there any model that can be easily put in different devices and work in real time? I can work on each frame video for description very fast. Is there any way to get the frame-by-frame information for a video ?

czczup commented 2 weeks ago

Hello, maybe our InternVL2-1B model is a good choice, it only has about 900M parameters.

SutirthaChakraborty commented 2 weeks ago

The InternVL2-1B doesn't give detailed information about expression, where they are looking correctly all the time.

czczup commented 2 weeks ago

Since our team is relatively small, we currently do not have the capacity to focus on inference acceleration and real-time performance evaluation. Our model updates are primarily centered on performance improvements.

I believe that real-time performance is mainly determined by two factors: the optimization level of the deployment framework and the number of model parameters. Based on the current progress in the community, small models with 1B to 4B parameters can run relatively well on edge devices like iPad. However, we do not yet have plans to directly support this functionality ourselves.