IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.92k stars 153 forks source link

Can GPTQ models be used for fine-tuning? #33

Closed siddhsql closed 1 year ago

siddhsql commented 1 year ago

I think the answer is no but wanted to check. can some expert let me know? thanks.

wyklq commented 1 year ago

The answer is yes, please check llmtune and falcontune projects in the github.

efrantar commented 1 year ago

If the quantized model is kept static, e.g. as in QLoRa, and you are just finetuning biases/scales/adapters/etc. you can generally perform the quantization in whatever way you want, i.e. also using GPTQ. This seems to be exactly what the llmtune project mentioned by wyklq in the previous comment is doing.

On the other hand, for applications where you want to fully requantize the whole model in each step and thus require an extremely fast quantizer, like in full quantization aware training, GPTQ will probably be a bit too slow.