Closed JSnobody closed 5 years ago
All the model compression methods requires the model to be re-trained, based on some training data, so that the performance degradation can be much smaller.
If you want to compress a model without any training operation, you may try to use:
$ python tools/conversion/export_quant_tflite_model.py --enbl_post_quant
to quantize a model with 32-bit floating-point weights into 8-bit fixed-point weights, using post-training quantization.
OK, I got it. But, another question, does the quantization method of PocketFlow use post-training quantization by Tensorflow?
No, we are using quantization-aware training to produce quantized models, which takes longer but the accuracy will be higher. If you don't want to use quantization-aware training, then just use post-training quantization with the above command.
I see. Thank you!
Hi,PocketFlow provides some optimization fuctions,such as channel pruning,weight sparsification,weight quantization,network distillation,multi-GPU training, hyper-parameter optimization. Could you tell me which one needs to be retrained when I want to use. Thanks very much!