AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
MIT License
4.37k stars 467 forks source link

[BUG] non positive-definite cholesky factorization #119

Open Lihengwannafly opened 1 year ago

Lihengwannafly commented 1 year ago

Describe the bug

2023-05-31 11:33:20 INFO [auto_gptq.modeling._base] Quantizing mlp.dense_4h_to_h in layer 7/70...
Traceback (most recent call last):
  File "quant_with_alpaca.py", line 178, in <module>
    main()
  File "quant_with_alpaca.py", line 121, in main
    model.quantize(
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/auto_gptq/modeling/_base.py", line 347, in quantize
    scale, zero, g_idx = gptq[name].fasterquant(
  File "/usr/local/lib/python3.8/dist-packages/auto_gptq/quantization/gptq.py", line 96, in fasterquant
    H = torch.linalg.cholesky(H, upper=True)
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 57335 is not positive-definite).

Hardware details single A100

Software version OS: ubuntu 20.04 Python: 3.8.10 CUDA: 11.8 PyTorch: 1.14.0a0+410ce96 transformers: 4.29.1 accelerate: 0.19.0

To Reproduce The command: python quant_with_alpaca.py --pretrained_model_dir /myapp/HF-bloom175B/ --quantized_model_dir /myapp/HF_quantized --bits 4 --num_samples 128(or 256) But it's successful to quantize bloom 7B model.

flozi00 commented 1 year ago

desc_act needs to be true, then it works as expected

wyklq commented 1 year ago

flozi00 can you be more specific, how to set desc_act to be true? thanks.

OK, I have found the answer by myself, after grep the source coes. It is a parameter in the example quantization program of e.g. quant_with_alpaca.py, and finally be a parameter in BaseQuantizeConfig(desc_act=True)

Yes, it works as expected. The only side effect is a bit slow in loading the big models. But that is tolerable. Thanks for the hit, it saves me huge effort.