casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.68k stars 202 forks source link

Issues with Bloom models #400

Open celisa opened 6 months ago

celisa commented 6 months ago

Hi!

I'm trying to quantize a 3B Bloom Model (https://huggingface.co/bigscience/bloom-3b). But it seems like it's missing the alibi tensor when performing the forward pass of the model.

Could you help me troubleshoot?

image
suparious commented 6 months ago

Can you see if the bloomz (instead of the older bloom) model, from the same author, is producing the same result? https://huggingface.co/bigscience/bloomz-3b

Returnvoidspec commented 5 months ago

Hello @suparious , I have the same problem, even with the BLOOMZ 3 billion parameters model. Has a solution been found, or is there anything I can do to solve the problem?

Returnvoidspec commented 5 months ago

It seems that the issue was that the tensor alibi given for the bloom model are not provided and need to be add via the init_quant file. I will open a patch branch.