Vahe1994 / SpQR

Apache License 2.0
515 stars 40 forks source link

Does permutation order have to be included when saving the quantized model? #31

Closed luccareinehr closed 11 months ago

luccareinehr commented 11 months ago

I understand model saving is yet to be implemented, but it looks like permutation may increase the memory footprint of the model.

If we save an SpQR-quantized model in a file and try to dequantize it, we'll end up with a permuted version of the weight matrices (in floating points). So, to use it in inference, it would need to be de-permuted.

Is there any other way of doing inference in SpQR without having to save the permutation order?

Vahe1994 commented 11 months ago

Hello, I'm sorry, for late response.

I understand model saving is yet to be implemented

Yes you are correct, here is a draft PR of model saving https://github.com/Vahe1994/SpQR/pull/32, it is almost complete,but not tested yet.

permutation may increase the memory footprint

Storing permutation will increase memory footprint by a negligible amount, less than 0.01 bits per parameter.

If we save an SpQR-quantized model in a file and try to dequantize it, we'll end up with a permuted version of the weight matrices (in floating points). So, to use it in inference, it would need to be de-permuted.

Yes, or you can de-permute activations.

Is there any other way of doing inference in SpQR without having to save the permutation order?

Unfortunately, we are not aware of it. As a workaround, you can skip permuting the activations, via identity option. Usually, the difference between act_order and identity is not large See Table 3 in SpQR paper.

image

luccareinehr commented 11 months ago

No worries about the time. That's awesome, thanks! :)