Do you quantize the LM head, embedding, and layernorms or just the weights? - Githubissues

Aaronhuang-778 / BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

https://arxiv.org/abs/2402.04291

MIT License

155 stars 12 forks source link

Do you quantize the LM head, embedding, and layernorms or just the weights? #4

Open tsengalb99 opened 4 months ago