Open mynotwo opened 7 months ago
Hi! I've noticed that the quantization layer would pack the quantized weight using class Quant3Linear, as shown below:
However, it seems to me that it only suits for 2bits and 3bits weights. If the original weights in $intweight is 4bits, some bits would be lost.
Could you explain the logic behind this? Thanks!
Hi! I've noticed that the quantization layer would pack the quantized weight using class Quant3Linear, as shown below:
However, it seems to me that it only suits for 2bits and 3bits weights. If the original weights in $intweight is 4bits, some bits would be lost.
Could you explain the logic behind this? Thanks!