Closed ICL-Developer closed 4 years ago
May be quantize the TF graph to further reduce the model size? Did you check that possibility? I deployed a pre-trained VGG model and the app crashed initially because of huge model size (>500MB). I then quantized the graph and size came down to < 130 MB. The app runs now, but the inference time is 4-5 seconds, which is still bad. Considering, imagenet models, the size can be brought down to may be 14-15 MB, but haven't tried it yet.
Hey folks, The best performance I achieved was using mobilenet models, which makes sense given that it was created specifically for mobile usage and features quick inference times though with a bit lower accuracy. Other than that no, I'm not sure how to improve performance.
Andrey,
Yes, Mobilenets are extremely fast, but isn't very accurate, at least in my scenario. Quantization works very well. For an InceptionV3 model, which is ~95 MB, Quantization reduces the size to ~24 MB without any loss of accuracy. You can check out quantization here: https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
You can also explore pruning, but I haven't tried it yet: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
@AvishekhDas, Thanks for the links, I'll check them out!
My new example using new Unity Barracuda inference engine, it gives better performance: https://github.com/Syn-McJ/TFClassify-Unity-Barracuda
HI,
Any update on how fix the performance issue?
kindly waiting for you reply.
Thank You.