Open theobjectivedad opened 2 weeks ago
The quant name in aphrodite is unfortunately a bit misleading - I intend to fix this with the next release. The load_in_4bit
quant isn't actually bitsandbytes, it's SmoothQuant+. We don't allow loading bnb weights directly yet. This will also be addressed with the next release.
Note that SQ+ is faster and offers better quality than bnb 4bit. bnb reduces throughput compared to fp16, while sq+ increases it by close to 3x.
Got it, ty for looking at this and helping me understand. Do you want me to close this issue?
Might be a good idea to keep it open in case someone else has the same issue. I'll close it myself once we have real bitsandbytes support.
Your current environment
Additionally:
transformers==4.40.0 bitsandbytes==0.43.1
🐛 Describe the bug
When loading CommandR+ (bnb/4bit) with the following command:
I get this error:
The image was build from main on 6/11/2024.
As indicated in the error message, omitting
--quantization bnb
doesn't seem to have any effect, I the error is from reading the model's config.json file:Some notes I took looking into this a little deeper...
quantization_config.quant_method
in the model'sconfig.json
would bebnb
... I'm sure this changed at some point intransformers
bnb
and addbitsandbytes
to maintain compatibility and reduce confusion.Lastly, I'm hesitant to fly solo and just submit a PR on this one since the current bnb quant config is pretty tightly integrated into
aphrodite/quantization/bitsandbytes.py
, not sure on the best way to proceed.