Closed SutirthaChakraborty closed 1 week ago
Hello, maybe our InternVL2-1B model is a good choice, it only has about 900M parameters.
The InternVL2-1B doesn't give detailed information about expression, where they are looking correctly all the time.
Since our team is relatively small, we currently do not have the capacity to focus on inference acceleration and real-time performance evaluation. Our model updates are primarily centered on performance improvements.
I believe that real-time performance is mainly determined by two factors: the optimization level of the deployment framework and the number of model parameters. Based on the current progress in the community, small models with 1B to 4B parameters can run relatively well on edge devices like iPad. However, we do not yet have plans to directly support this functionality ourselves.
Is there any model that can be easily put in different devices and work in real time? I can work on each frame video for description very fast. Is there any way to get the frame-by-frame information for a video ?