Open irasin opened 1 year ago
When the values are packed, zero is stored as (zero + 1).
Hi, @fpgaminer, thanks.
I want to know why we need to pack the values as (zero + 1)? Is it for any numerical considerations?
Sorry, I mistyped, zero is stored as zero - 1
.
It's more clear in the non-simplified formula which is: w = w * s - (z + 1) * s
As to why, well it seems the quantization algorithm outputs zero
in the range [1, 2**bits]
. As to why, I don't know. You can reference gptq.py
to study the implementation of the quantization algorithm, or query the paper authors. My main focus is just on the kernel side of things.
I wonder why we need to use z - 1 here since the normal quantization is
w = (w - z) * s