Closed zye1996 closed 2 years ago
@zye1996 ,
Thanks for you feedback.
"The attached mobilenetv2 model takes 60ms each run according to the benchmark but the model downloaded from tensorflow website has no issue."
Do you mean the official model can get better performance than your own moblienetv2 model? Which platform you used for the benchmark? And could you double check if the official model is per-tensor quantized or per-channel quantized?
Thanks
per-channel quantized
Yes the official model from Tensorflow runs faster. I am on the VIM3 a311d platform.
The official one is here although it is a mobilenet v1. It took 2ms on NPU so I believe the speed is reasonable. It should be per-tensor quantized.
sorry I found the reason, I did not disable per-channel quantization during model conversion. After the flag is disabled, the performance makes sense
Dear zye1996:
Can you share your the versions of tflite vx delegate and TIM-VX running on A311D?
Thanks very much.
Hi,
I am facing the inference performance issue when there is quantize/dequantize operation in tflite graph. The attached mobilenetv2 model takes 60ms each run according to the benchmark but the model downloaded from tensorflow website has no issue. Am I doing anything wrong or there is some setting to change?
mobilenet_v2_224_dm05_full_integer_quant.zip