VLM quantization-aware?

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

https://casper-hansen.github.io/AutoAWQ/

MIT License

1.41k stars 160 forks source link

VLM quantization-aware? #438

Open SinanAkkoyun opened 2 months ago

SinanAkkoyun commented 2 months ago

Hi, does the AWQ algorithm look at activation based on a prompt dataset? If yes, wouldn't VLMs be inaccurate because of missing vision embedding?

suparious commented 2 months ago

not sure if I am smart enough to provide a quality response, however, i can suggest to review the original project from MIT here, which has a nice illustration that shows what's happening. I am not sure if it is related to the prompt, but rather to the inference of the response.

mit-han-lab/llm-awq
Activation-aware Weight Quantization