Closed yanni-code closed 11 months ago
Hi, please see our updated repo QuIP# which does save weights in the correct format. For this project we were primarily testing the effect on model quality (ie perplexity) as a result of quantization, and therefore employed "fake quantization" where we restricted to the correct number of unique points, but kept things in fp16 for ease of engineering.
I have a doubt, the model stored after 4bit or 2bit quantization is still fp 16, and the engineering does not implement quantization.