为什么我在量化模型之后模型的体积几乎没有变化

Aaronhuang-778 / BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

https://arxiv.org/abs/2402.04291

MIT License

198 stars 14 forks source link

为什么我在量化模型之后模型的体积几乎没有变化 #17

Closed zxbjushuai closed 2 months ago

zxbjushuai commented 3 months ago

在我使用python3 run.py meta-llama/Llama-2-7b-hf c4 braq --blocksize 128 --salient_metric hessian --device "cuda:0"指令进行量化后，量化后的模型safetensor文件所占空间综合仍然为12.6gb，是否是我保存方式有问题？ $V4` G1{}63}X)}_0N$(A3~6$

zxbjushuai commented 3 months ago

或者说期望的正确结果应该是多少？

zxbjushuai commented 3 months ago

我使用的模型文件是下载下来的本地文件safetensor格式，有没有可能是需要使用其他格式的模型文件比如bin？

BaohaoLiao commented 2 months ago

according to this issue https://github.com/Aaronhuang-778/BiLLM/issues/14. It is fake quantization. I.e. using fp16 to simulate 1-bit

zxbjushuai commented 2 months ago

Thanks