Set internvl as an example, it's vision model is 6B. If the vision model can be quantilized, the inference process can be done in only one 4090.
请问目前vision model不支持量化的原因,是因为feature暂时还没有去做,还是因为目前使用的awq对于vision model量化的实现效果不太好吗
LMDeploy architecture is designed and implemented for LLM inference optimization rather than the vision models. Currently, we don't want to break the rule
Motivation
Set internvl as an example, it's vision model is 6B. If the vision model can be quantilized, the inference process can be done in only one 4090. 请问目前vision model不支持量化的原因,是因为feature暂时还没有去做,还是因为目前使用的awq对于vision model量化的实现效果不太好吗
Related resources
No response
Additional context
No response