IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.95k stars 155 forks source link

GPTQ for BERT #26

Closed BecomeAllan closed 1 year ago

BecomeAllan commented 1 year ago

I'm looking for the GPTQ implementation for BERT, why isn't it in the repository? i want to try 4bit implementation for speed comparison and try other models as well

efrantar commented 1 year ago

Hi, for BERT quantization, please see our OBC (predecessor work to GPTQ) repository here. As GPTQ's interface in gptq.py is very similar to trueobs.py, it should be quite easy to integrate GPTQ into our BERT code there. More concretely, I think one just has to adapt lines 97 and 128 in main_trueobs.py (which is essentially what we did for our BERT and ResNet experiments in the GPTQ paper). This repository here is focused on our main application, large generative LMs.