casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.67k stars 202 forks source link

Add Cohere Support #403

Open localbarrage opened 6 months ago

localbarrage commented 6 months ago

https://huggingface.co/CohereForAI/c4ai-command-r-v01

The new Cohere model is #1 trending on huggingface right now. It excels at RAG, Tool Usage (Json generation), etc. It is a 35B parameter model so AWQ quantization support would be nice.

casper-hansen commented 5 months ago

This model seems incredibly useful. Since this is a dense model, adding quantization support should be a bit easier. I will experiment soon with this model to see if we can quantize it.