Closed jiuzhuanzhuan closed 1 year ago
1 - This is not the right forum for these types of questions. I suggest asking at: https://developer.apple.com/forums/
2 - I don't understand this question. Please clarify the question.
As compress weight, coreml support quantizing parameters to 8bits, and converted back to 16bits to inference. But the existing method of quantizing weights to 8 bits affect my model's accuracy a lot,
# quantize to 8 bit using linear mode model_8bit = quantize_weights(model_fp32, nbits=8)
model_8bit = quantize_weights(model_fp32, nbits=8, quantization_mode="kmeans")
model_8bit = quantize_weights(model_fp32, nbits=8, quantization_mode="linear_symmetric")
thus I want to know whether the parameters of quantization can be reset by my trained parameters.
thus I want to know whether the parameters of quantization can be reset by my trained parameters.
All the compression APIs available in coremltools (coremltools.models.neural_network.quantization_utils.quantize_weights
and coremltools.compression_utils.*
) operate on a model that has been converted, so if the converted model is generated from a pre trained model, it will have the trained parameters to start with. Then those params will be quantized/palettized/pruned etc depending on the API thats used.
coremltools does not support converting pre quantized models, that are generated from QAT since activation quantization is not supported, only weight compression is.