Closed luccareinehr closed 11 months ago
Hello, I'm sorry, for late response.
I understand model saving is yet to be implemented
Yes you are correct, here is a draft PR of model saving https://github.com/Vahe1994/SpQR/pull/32, it is almost complete,but not tested yet.
permutation may increase the memory footprint
Storing permutation will increase memory footprint by a negligible amount, less than 0.01 bits per parameter.
If we save an SpQR-quantized model in a file and try to dequantize it, we'll end up with a permuted version of the weight matrices (in floating points). So, to use it in inference, it would need to be de-permuted.
Yes, or you can de-permute activations.
Is there any other way of doing inference in SpQR without having to save the permutation order?
Unfortunately, we are not aware of it. As a workaround, you can skip permuting the activations, via identity option. Usually, the difference between act_order and identity is not large See Table 3 in SpQR paper.
No worries about the time. That's awesome, thanks! :)
I understand model saving is yet to be implemented, but it looks like permutation may increase the memory footprint of the model.
If we save an SpQR-quantized model in a file and try to dequantize it, we'll end up with a permuted version of the weight matrices (in floating points). So, to use it in inference, it would need to be de-permuted.
Is there any other way of doing inference in SpQR without having to save the permutation order?