casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.66k stars 198 forks source link

GPTQ model weights conversion/interop #101

Open K024 opened 11 months ago

K024 commented 11 months ago

It is possible to convert GPTQ models without act_order (when g_idx is not used) to AWQ gemv compatible format since AWQ gemv changed the pack order to a natural order.

GPTQ storage format:

qweight: (in_dim / pack_size, out_dim)
qzeros: (in_dim / group_size, out_dim / pack_size)
scales: (in_dim / group_size, out_dim)

AWQ gemv storage format:

qweight: (out_dim, in_dim / pack_size)
qzeros: (out_dim, in_dim / group_size / pack_size)
scales: (out_dim, in_dim / group_size)

To convert GPTQ linear weights to AWQ:

*: Not sure why they subtract 1 when packing zeros.

I've tested the compatibility in this repo with awq_gemmv2 and exllama_q4_matmul.

casper-hansen commented 11 months ago

I believe it could be possible and would be open to PRs implementing this work. Do note that GPTQ without activation reordering is vastly inferior in accuracy, so this would mostly be about compatibility rather than speed/accuracy.