Closed verigle closed 1 year ago
In principle, GPTQ should be applicable to most types of models; see also #3 and #8 for some advice on applying GPTQ to other models than the ones in this repository. In the case of BLIP2 I would guess that you probably want to compress the vision part first and then follow up with the language part, using calibration data that went through the already quantized vision component.
How to apply 3/4-bit quantization to vision-language model like BLIP2?