Aaronhuang-778 / BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
https://arxiv.org/abs/2402.04291
MIT License
198 stars 14 forks source link

为什么我在量化模型之后模型的体积几乎没有变化 #17

Closed zxbjushuai closed 2 months ago

zxbjushuai commented 3 months ago

在我使用python3 run.py meta-llama/Llama-2-7b-hf c4 braq --blocksize 128 --salient_metric hessian --device "cuda:0"指令进行量化后,量化后的模型safetensor文件所占空间综合仍然为12.6gb,是否是我保存方式有问题? V4` G1{}63}X)}_0N$(A3~6

zxbjushuai commented 3 months ago

或者说期望的正确结果应该是多少?

zxbjushuai commented 3 months ago

我使用的模型文件是下载下来的本地文件safetensor格式,有没有可能是需要使用其他格式的模型文件比如bin?

BaohaoLiao commented 2 months ago

according to this issue https://github.com/Aaronhuang-778/BiLLM/issues/14. It is fake quantization. I.e. using fp16 to simulate 1-bit

zxbjushuai commented 2 months ago

Thanks