Closed BecomeAllan closed 1 year ago
Hi, for BERT quantization, please see our OBC (predecessor work to GPTQ) repository here. As GPTQ's interface in gptq.py
is very similar to trueobs.py
, it should be quite easy to integrate GPTQ into our BERT code there. More concretely, I think one just has to adapt lines 97 and 128 in main_trueobs.py
(which is essentially what we did for our BERT and ResNet experiments in the GPTQ paper). This repository here is focused on our main application, large generative LMs.
I'm looking for the GPTQ implementation for BERT, why isn't it in the repository? i want to try 4bit implementation for speed comparison and try other models as well