Closed anujnayyar1 closed 1 year ago
Hi,
this repository is currently mostly meant as a reference implementation of the GPTQ algorithm and surrounding techniques (like efficient kernels), to supplement our paper. Recently, there has been some work on extensions for actually using GPTQ quantized models in practice, for example GPTQ for LLaMa. I am not sure if the code there also fully supports OPT already, but it might still be a useful starting point.
Hey! Huge congratulations on your achievement and thank you for sharing! I am following the steps to quantise an OPT model (13B) that I have finetuned. I wish to serve this model for inference. Will I simply be able to save the quantised model, and load it into the transformers library?
If not whats the best way to do this?
All the very best