Can GPTQ models be used for fine-tuning?

siddhsql commented 1 year ago

I think the answer is no but wanted to check. can some expert let me know? thanks.

wyklq commented 1 year ago

The answer is yes, please check llmtune and falcontune projects in the github.

efrantar commented 1 year ago

If the quantized model is kept static, e.g. as in QLoRa, and you are just finetuning biases/scales/adapters/etc. you can generally perform the quantization in whatever way you want, i.e. also using GPTQ. This seems to be exactly what the llmtune project mentioned by wyklq in the previous comment is doing.

On the other hand, for applications where you want to fully requantize the whole model in each step and thus require an extremely fast quantizer, like in full quantization aware training, GPTQ will probably be a bit too slow.

IST-DASLab / gptq

Can GPTQ models be used for fine-tuning? #33