GPTQ model weights conversion/interop

It is possible to convert GPTQ models without act_order (when g_idx is not used) to AWQ gemv compatible format since AWQ gemv changed the pack order to a natural order.

GPTQ storage format:

qweight: (in_dim / pack_size, out_dim)
qzeros: (in_dim / group_size, out_dim / pack_size)
scales: (in_dim / group_size, out_dim)

AWQ gemv storage format:

qweight: (out_dim, in_dim / pack_size)
qzeros: (out_dim, in_dim / group_size / pack_size)
scales: (out_dim, in_dim / group_size)

To convert GPTQ linear weights to AWQ:

qweight: simply transpose it,
qzeros: first unpack at dim 1 back to (in_dim / group_size, out_dim), then add 1 to it*, and finally transpose it and pack at the transposed dim 1 (and pad zeros to the required width),
scales: simply transpose it,
Set all ScaledActivation to 1s.

*: Not sure why they subtract 1 when packing zeros.

I've tested the compatibility in this repo with awq_gemmv2 and exllama_q4_matmul.

casper-hansen / AutoAWQ

GPTQ model weights conversion/interop #101