Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Apache License 2.0
1.1k stars 170 forks source link

[Feature Request] Gemma2 support & models #111

Open kristaller486 opened 2 months ago

kristaller486 commented 2 months ago

Gemma2 is incredible model. With AQLM, it will fit into 12GB GPU.

Vahe1994 commented 1 month ago

Hello! Sorry for the late answer. It should fit .I'm planning to get to this soon. I'll keep you updated in this thread, but it may take weeks.

kristaller486 commented 1 month ago

Thank you for answer. I'll be waiting!

igvasilev commented 1 week ago

Will you add Gemma2 27B as well?