PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine
https://pygmalion.chat
GNU Affero General Public License v3.0
722 stars 85 forks source link

Refactor: Quantization #454

Closed AlpinDale closed 1 month ago

AlpinDale commented 1 month ago

Some light refactors to isolate the quantization code from the regular stuff. Also lets users disable compilation for quant kernels by passing the APHRODITE_INSTALL_QUANT_KERNELS=0 env variable.

AlpinDale commented 1 month ago

I think it would be much better to check HAS_QUANTS in each QuantizationConfig subclass' __init__ rather than apply_weights, it errs earlier and reduces unnecessary changes to code.

Good idea, I'll change it to that.