[Feature] Why not support vision model quantilization?

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

4.17k stars 376 forks source link

[Feature] Why not support vision model quantilization? #2040

Open Leo-yang-1020 opened 2 months ago

Leo-yang-1020 commented 2 months ago

Motivation

Set internvl as an example, it's vision model is 6B. If the vision model can be quantilized, the inference process can be done in only one 4090. 请问目前vision model不支持量化的原因，是因为feature暂时还没有去做，还是因为目前使用的awq对于vision model量化的实现效果不太好吗

Related resources

No response

Additional context

No response

lvhan028 commented 2 months ago

There are two reasons:

LMDeploy architecture is designed and implemented for LLM inference optimization rather than the vision models. Currently, we don't want to break the rule
lack of human resource