IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.91k stars 152 forks source link

Application to GPT-J family #3

Closed khadijakarchaoui closed 1 year ago

khadijakarchaoui commented 1 year ago

Congratulations on your achievement.

Can you give us some hints and recommandations to adapt the procedure in order to quantify the GPT-J models family ?

efrantar commented 1 year ago

Hi,

since the GPT-J models are I think pretty standard Transformers, much like OPT and BLOOM, it should not be too difficult to apply GPTQ to those models as well.

There core algorithm in gptq.py is designed to work with linear layers and should thus work for GPT-J as well (assuming that the model implementation also uses nn.Linear layers). What will however require changes is the code that applies this algorithm to each layer in turn: opt_sequential() and bloom_sequential() in the opt.py and bloom.py files, respectively. In general, I believe this should mainly involve changing layer-names to match those in the GPT-J model definition (there might however be some other model-specific details that require some additional adjustments). I would recommend to compare our OPT and BLOOM implementations (opt.py and bloom.py) to see what is different between them and hence will probably need some adjustment for GPT-J.

I hope this helps a bit.

Elias