Aaronhuang-778 / BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
https://arxiv.org/abs/2402.04291
MIT License
155 stars 12 forks source link

inference #9

Open shyget opened 3 months ago

shyget commented 3 months ago

hello,i want to know how can i calculate the BitWeight.And why the memory consumption isnot decrease in the stage of inference?