inference - Githubissues

Aaronhuang-778 / BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

https://arxiv.org/abs/2402.04291

MIT License

155 stars 12 forks source link

Open shyget opened 3 months ago

shyget commented 3 months ago

hello，i want to know how can i calculate the BitWeight.And why the memory consumption isnot decrease in the stage of inference?