apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

https://coremltools.readme.io

BSD 3-Clause "New" or "Revised" License

4.42k stars 639 forks source link

did core ml support Post Training Quantization (PTQ) or Quantization Aware Training (QAT)? #1760

Closed jiuzhuanzhuan closed 1 year ago

jiuzhuanzhuan commented 1 year ago

did ANE support 8bits Inference and Acceleration?
did coreml support free set quantized parameters? PTQ/QAT have higher precision for my model

TobyRoseman commented 1 year ago

1 - This is not the right forum for these types of questions. I suggest asking at: https://developer.apple.com/forums/

2 - I don't understand this question. Please clarify the question.

jiuzhuanzhuan commented 1 year ago

As compress weight, coreml support quantizing parameters to 8bits, and converted back to 16bits to inference. But the existing method of quantizing weights to 8 bits affect my model's accuracy a lot,
```
# quantize to 8 bit using linear mode
model_8bit = quantize_weights(model_fp32, nbits=8)
```

quantize to 8 bit using LUT kmeans mode

model_8bit = quantize_weights(model_fp32, nbits=8, quantization_mode="kmeans")

quantize to 8 bit using linearsymmetric mode

model_8bit = quantize_weights(model_fp32, nbits=8, quantization_mode="linear_symmetric")


thus I want to know whether the parameters of quantization can be reset by my trained parameters.

aseemw commented 1 year ago

thus I want to know whether the parameters of quantization can be reset by my trained parameters.

All the compression APIs available in coremltools (coremltools.models.neural_network.quantization_utils.quantize_weights and coremltools.compression_utils.*) operate on a model that has been converted, so if the converted model is generated from a pre trained model, it will have the trained parameters to start with. Then those params will be quantized/palettized/pruned etc depending on the API thats used. coremltools does not support converting pre quantized models, that are generated from QAT since activation quantization is not supported, only weight compression is.