IST-DASLab gptq issues - Githubissues

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

https://arxiv.org/abs/2210.17323

Apache License 2.0

1.86k stars 150 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Test on CNN model containing group conv by GPTQ method

#56 xd1073321804 opened 3 months ago
0
Reconstruct Quantized Model Layer in torch.

#55 puja93 opened 3 months ago
0
Dct 1d v1

#54 igormolybog closed 5 months ago
0
GPTQ on BERT based

#53 zhanyil2 opened 6 months ago
0
AssertionError

#52 GuoYi0 opened 7 months ago
0
Regarding the method for computing the Hessian matrix.

#51 baiSongL opened 8 months ago
1
GPTQ pseudo-quantization saved weights (pt format) How load Re-evaluation

#50 CXiaorong opened 8 months ago
0
GPTQ转化的INT8模型，如何运行呢？请大佬指教

#49 xxm1668 opened 9 months ago
0
Compatibility of Quant3Linear and 4-bit quantization

#48 mynotwo opened 9 months ago
0
act-order on inference

#47 frankxyy opened 9 months ago
0
pack_model takes too long time

#46 westboy123 closed 9 months ago
1
running speed slow on NVIDIA vGPU

#45 foricee opened 10 months ago
0
H_inv not updated

#44 MilesQLi closed 9 months ago
3
About the cuda code, I think "tmp2 >> 30" should be " tmp2 >> 31"

#43 JachinJiang closed 12 months ago
2
Update gptq.py

#42 zzz0906 opened 1 year ago
0
Use modified Cholesky decomposition instead of regularized Cholesky

#41 jiahao opened 1 year ago
0
Why is the wikitext-2 ppl calculated in the code lower than the ppl by lm-evaluation-harness?

#40 Chocolife-96 opened 1 year ago
0
LAMBADA evaluation accuracy

#39 kayhanbehdin opened 1 year ago
0
How should I verify the speedup effect of the algorithm?

#38 moonlightian opened 1 year ago
0
How to run the quantized model for perditions on my prompts?

#37 tarunmcom opened 1 year ago
0
How can we use this lib to quantize Falcon7b / 40b models?

#36 tarunmcom opened 1 year ago
0
How to adopt GPTQ on Conv2d with `groups` attribute?

#35 TMYuan opened 1 year ago
1
PPL results on wikitext/ptb/c4 are worse than the official result

#34 xingyueye opened 1 year ago
2
Can GPTQ models be used for fine-tuning?

#33 siddhsql closed 1 year ago
2
Is there a beginners guide to the GPTQ algorithm?

#32 vgoklani closed 1 year ago
1
The reshape of input_id doesn't match HF OPT model's API

#31 brian-fb closed 1 year ago
1
NVM

#30 brian-fb closed 1 year ago
0
Baptiste Fernandez - Adding my part of the story

#29 fernandezbaptiste closed 1 year ago
0
How to apply 3/4-bit quantization to vision-language model?

#28 verigle closed 1 year ago
1
Question about the difference between the pseudocode and the implementation

#27 RachelXu7 closed 1 year ago
1
GPTQ for BERT

#26 BecomeAllan closed 1 year ago
1
Does GPTQ reduce to OBQ if I set block size to 1?

#25 zxxmxd closed 1 year ago
2
Update gptq.py

#24 Lihengwannafly closed 1 year ago
0
quant_cuda_kernel.cu(212): error: identifier "__hfma2" is undefined

#23 HueCheng1021 opened 1 year ago
1
Can --save work with --groupsize in opt.py?

#22 Frozenmad closed 1 year ago
1
Why no update to Hinv

#21 deciding closed 1 year ago
4
OpenCL Support

#20 apcameron closed 1 year ago
1
About `--sym` zero point

#19 tpoisonooo closed 1 year ago
1
ValueError: not enough values to unpack (expected 2, got 1)

#18 jinz2014 closed 1 year ago
2
quantized GPTJ - error on inference

#17 imthebilliejoe closed 1 year ago
1
Minor fix for llama

#16 Xiuyu-Li closed 1 year ago
1
Conversion of OPT-175B singleton to HF checkpoint

#15 ayeeyecorp closed 1 year ago
0
How to apply 3/4-bit quantization to computer vision models?

#14 zshn25 closed 1 year ago
4
Please comment on why the A100 specific commit makes it faster?

#13 Qubitium closed 1 year ago
2
Title: Feature Request: Add Saving Quantized Weights Functionality to bloom.py

#12 bestpredicts closed 1 year ago
1
License issues

#11 AlpinDale closed 1 year ago
1
Pretrained Weights for Bloom and BloomZ (4-bit)

#10 agemagician closed 1 year ago
1
opt_eval error

#9 liangxiaoyun closed 1 year ago
0
Application to T5 / UL2 family

#8 iiLaurens opened 1 year ago
7
How to run on multi GPUs?

#7 TitanSneaker closed 1 year ago
2