AIWintermuteAI / aXeleRate

Keras-based framework for AI on the Edge
MIT License
177 stars 71 forks source link

low fps #4

Closed GiordanoLucio closed 4 years ago

GiordanoLucio commented 4 years ago

Hi, I noticed that the models are converted using nncase 0.2 which returns a kmodel V4.

Considering that they still are not fully supported, wouldn't it be better to model the net in a way that it's possible to convert it using nncase 0.1.5 which returns a kmodel V3?

In particular, nncase 0.1.5 doesn't support tensorflow reshape, is it possible to remove it?

I'm writing this because I noticed that the original 20classes_yolo offered as a demo runs at 19.5 fps on a maix Go while a tiny yolo v2 net trained with you tool runs at about 13 fps.

Anyway you did a great job, Thank you.

AIWintermuteAI commented 4 years ago

Hello! Yes, currently the conversion is done using nncase 0.2. I think the issue of slower inference was because of V4’s MATMUL was not implemented with KPU, but with CPU (as referenced https://en.bbs.sipeed.com/t/topic/1790) and nncse 0.2.0 beta3 added support for KPU matmul. I will find time this week to check the speed of kmodelv3 and kmodelv4 converted using nncse 0.2.0 beta3 - simply comparing original 20classes_yolo and Keras implementation of tiny yolo v2 cannot give you exact details on speed, because the architecture implementation might slightly differ(I am not sure about the implementation of original 20classes_yolo). There are some good news though - first of all, when you use aXeleRate to convert trained model to .kmodel, it automatically cuts off Reshape layer, so no changes needed here. In fact if you have "converter" : { "type":["k210"] in config, the .tflite file in the project folder is the network without reshape layer and you can easily convert it using nncase 0.1.5. If you think adding kmodelv3 conversion is a good feature, you can have a look at convert.py script in common_utils folder - it is fairy simple to add nncse 0.1.5 conversion by adding another function there. I will accept the PR request, if you would like to implement that feature

GiordanoLucio commented 4 years ago

Thanks for the explanation regarding the removal of the reshape layer, I didn't notice it. I will try to convert it with the nncase 0.1.5 and if the performance are better I'll send you the PR. Thanks!

AIWintermuteAI commented 4 years ago

Okay, I finally found the time to do some performance testing, here are the results: First model is MobileNet(alpha=0.75), converted with nncse 0.1.5 3.44542_FPS_74.68ms Second model is MobileNet(alpha=0.75), converted with nncse 0.2.0beta2 13.29397_FPS_74.9ms Third model is Tiny YOLO, converted with nncse 0.2.0beta2 18.95818_FPS_53.16ms

We can conclude that is not a converter issue, but a model architecture difference. One thing that does seem a bit strange is that Tiny YOLO actually has more parameters and multiply-adds. Mobilenet uses depth-wise convolutions and supposed to be faster because of that. People in kendryte or sipeed team can tell you more on why Mobilenet is slower than Tiny YOLO on K210. All models are tested with 5-0.31 firmware, find the test script below. v3_13.44542_FPS_74.68ms.zip v4beta2_13.29397_FPS_74.9ms.zip v4_18.95818_FPS_53.16ms.zip person_detector_v4fps.zip

AIWintermuteAI commented 4 years ago

Seems the issue was resolved, closing now. Feel free to reopen if there is more questions.

GiordanoLucio commented 4 years ago

Thank you, sorry for not answering before, had a lot of stuff to do and forgot about it

shanejohnpaul commented 4 years ago

I tried this myself on the Maix Bit. I took the MobileNetV1(alpha=0.75 and three output classes). This is what I got (just for the forward pass),

v0.1.0-rc5 --> 29ms v0.2.0-beta2 --> 34ms v0.2.0-beta4 --> 210ms

PS - I'm not using aXeleRate. I compiled the trained model using nncase separately. I saw this post and the Sipeed's post on v3 vs v4 kmodels, and just wanted to address this.