Closed andysingal closed 5 months ago
I am guessing you have CUDA 11.8 installed and used pip install autoawq
which requires CUDA 12.1. You can instead run the following for CUDA 11.8 of AutoAWQ (Python 3.10):
pip install git+https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.8/autoawq-0.1.8+cu118-cp310-cp310-linux_x86_64.whl
I used RTX4090 24GB running Mixtral-8x7B-Instruct-v0.1 AWQ out of memory! @casper-hansen It need more VRAM? I ran Qwen-72B-Chat it work well.
I have only been able to quantize Mixtral on 48GB VRAM.
While running
getting error: