Open BlairSadewitz opened 1 month ago
Perhaps, I'll have to look into it. bnb hasn't been a priority
Yeah, I hear you. I'm gonna file a better PR in a second, though, so ... ;-)
FYI I'm working on new kernels for massively speeding up bnb quants + add TP support for them. You might want to hold on for now, or help out with that upcoming PR if you're comfortable with CUDA
🚀 The feature, motivation and pitch
I don't know if it's feasible or worthwhile to merge this, as maybe the trees are too divergent, etc., but cherry-picking commits for projects I don't fully understand is somehow a pastime for me, so ...
Alternatives
I could always use one of the other 8.4234234*10^23 quantization methods, but, hey, variety is the spice of life--or something.
Additional context
It doesn't work for pre-quantized models. 🎉~