Open K024 opened 11 months ago
I believe it could be possible and would be open to PRs implementing this work. Do note that GPTQ without activation reordering is vastly inferior in accuracy, so this would mostly be about compatibility rather than speed/accuracy.
It is possible to convert GPTQ models without act_order (when g_idx is not used) to AWQ gemv compatible format since AWQ gemv changed the pack order to a natural order.
GPTQ storage format:
AWQ gemv storage format:
To convert GPTQ linear weights to AWQ:
qweight
: simply transpose it,qzeros
: first unpack at dim 1 back to(in_dim / group_size, out_dim)
, then add 1 to it*, and finally transpose it and pack at the transposed dim 1 (and pad zeros to the required width),scales
: simply transpose it,*: Not sure why they subtract 1 when packing zeros.
I've tested the compatibility in this repo with
awq_gemmv2
andexllama_q4_matmul
.