How to apply 3/4-bit quantization to vision-language model?

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

https://arxiv.org/abs/2210.17323

Apache License 2.0

1.81k stars 145 forks source link

How to apply 3/4-bit quantization to vision-language model? #28

Closed verigle closed 1 year ago

verigle commented 1 year ago

How to apply 3/4-bit quantization to vision-language model like BLIP2？

efrantar commented 1 year ago

In principle, GPTQ should be applicable to most types of models; see also #3 and #8 for some advice on applying GPTQ to other models than the ones in this repository. In the case of BLIP2 I would guess that you probably want to compress the vision part first and then follow up with the language part, using calibration data that went through the already quantized vision component.