AutoGPTQ quantization example

InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

1.91k stars 120 forks source link

AutoGPTQ quantization example #340

Open nzomi opened 3 days ago

nzomi commented 3 days ago

As I understand, in the 4-bit example demo, we quantize a model by replacing an original linear layer with a quantized layer using the recurse_setattrfunction in make_quant provided by autogpt. However, when I checked the VL-7B model and the 4-bit model, I found that none of the linear layers were replaced; instead, the quantized layer was inserted into the LoRA module. How did you achieve that?