if operation is an QuantableOperation, we have to quant its inputs and outputs at first.
if isinstance(operation, QuantableOperation):
input_configs = [_ for _ in operation.config.input_quantization_config]
inputs = [self.quantize_function(input, config) for input, config in zip(inputs, input_configs)]
ppq是一个很棒的框架,非常系统的考虑到了模型量化落地的方方面面,非常值得学习。 尝试了下量化效果的确不错,但是有一个问题,目前ppq能否支持权重和激活的bit位宽不同的量化呢? 比如a16w8,即激活16bit,权重8bit。 初步看了下相关的代码(ppq/executor/torch.py, L: 515),目前似乎权重和激活是一起处理的,没有进行区分。 `
if operation is an QuantableOperation, we have to quant its inputs and outputs at first.
`