Closed AlpinDale closed 1 month ago
I think it would be much better to check
HAS_QUANTS
in eachQuantizationConfig
subclass'__init__
rather thanapply_weights
, it errs earlier and reduces unnecessary changes to code.
Good idea, I'll change it to that.
Some light refactors to isolate the quantization code from the regular stuff. Also lets users disable compilation for quant kernels by passing the
APHRODITE_INSTALL_QUANT_KERNELS=0
env variable.