Closed KevinWang905 closed 1 year ago
Hi Kevin,
thanks for the good report.
Your C++-compiler invocation looks ok for speed. It has the important -O3 -DNDEBUG
is there.
Since you have -march=native
too, I think you can remove all the other -m...
things, i.e., have
g++ -O3 -DNDEBUG -march=native main.cpp
in the end.
Can you give this a try?
If it does not help, could you upload your model (the not-yet converted version) for me to experiment with it and find the bottleneck?
Thanks! I've tried that before and it has the same speed. I've sent an email to you containing the link to my model and some testing code. Let me know if you need anything else.
Thank you. With the model you sent me, I just reproduced the performance problem locally. It's actually even worse on my machine, i.e.:
I'll investigate and get back to you here.
Profiling (sysprof
) showed, all the CPU time is burned exactly here.
In this MR, I introduced this unnecessarily large calculation (very redundant) accidentally. :grimacing:
I just fixed it with this commit and released a new version.
Now, a forward pass with your model in frugally-deep is fast (~ 0.075 s on my machine). :tada:
Thanks a lot for reporting this and providing such a good explanation (plus the example model)! :heart:
Thank you!
Hi Tobias,
I'm trying to implement a segmentation model with mobilenetv3 (tensorflow mobilenetv3_large minimalistic) with a LR-ASPP segmentation head that I trained in python. When converting my model, the forward passes take < 1s, but when I load it in C++, the forward pass takes 8s. I am using WSL running Ubuntu 22.04. I'm pretty new to development in C++ so there's likely some compilation mistakes I may have made, but I would love to get your feedback on why this speed discrepancy exists. I've posted the model conversion and loading outputs below. I can send you the model json as well.
Appreciate the work you've put into this library. Thanks! Kevin