Inference of the Quantised Model (OPT-13B)

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

https://arxiv.org/abs/2210.17323

Apache License 2.0

1.86k stars 150 forks source link

Inference of the Quantised Model (OPT-13B) #5

Closed anujnayyar1 closed 1 year ago

anujnayyar1 commented 1 year ago

Hey! Huge congratulations on your achievement and thank you for sharing! I am following the steps to quantise an OPT model (13B) that I have finetuned. I wish to serve this model for inference. Will I simply be able to save the quantised model, and load it into the transformers library?

If not whats the best way to do this?

All the very best

efrantar commented 1 year ago

Hi,

this repository is currently mostly meant as a reference implementation of the GPTQ algorithm and surrounding techniques (like efficient kernels), to supplement our paper. Recently, there has been some work on extensions for actually using GPTQ quantized models in practice, for example GPTQ for LLaMa. I am not sure if the code there also fully supports OPT already, but it might still be a useful starting point.