Open Devy99 opened 5 months ago
Hi, In this version we only release the fake-quantization to validate the theoretical compression performance border of LLM. And this code does not support the saving of quantized model. authors
Thanks for the prompt response! Do you plan to release the quantization pipeline in the near future?
Hello, first and foremost, I want to thank you for your incredible work!
I'd like further information on how to reproduce your code. I followed the code instructions in your README, but I am unable to retrieve the quantized models.
The following are the steps I took to replicate your work:
Differently from the README instructions, I updated the calibration dataset with wikitext2 (because c4 is not accessible) and added the —save option to obtain the quantized model in the output folder.
However, it appears that the final model does not match the expected one. Its size is identical to the original model, and it does not appear to have been quantized.
Did I miss something? Also, should I run the inference using a specific procedure?