Closed PhilipXue closed 5 years ago
@PhilipXue I tried to create it, so please try it. I have not verified the behavior yet. Please let me know if it worked. https://github.com/PINTO0309/TensorflowLite-bin.git
@PINTO0309 Wow that's amazing! I'll try it immediately!
@PINTO0309 Hi, I tied you the whl you provide today and here are some findings:
Interpretor
class doesn't have set_num_threads
attribute. Does the tflite-runtime package include the improvment you did to tensorflow build?
@PhilipXue I'm sorry. I noticed a mistake in correcting the program. I will fix it today. please wait a moment. I have identified the cause.
@PhilipXue Fixed and recommitted. tflite_runtime-1.14.0-cp35-cp35m-linux_armv7l.whl tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl
@PINTO0309 Thanks for your swift response! I tried out the new whl file and it works like a charm (multithread performance improved over full-sized tf runtime)! Thanks a lot for your effort! BTW, Do you have any plan to build tflr whls for 64-bit debian buster (python3.7 aarch64)?
@PhilipXue It is complete. The working time is about 5 minutes. However, I don't have a device that can be tested. tflite_runtime-1.14.0-cp35-cp35m-linux_aarch64.whl tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl
https://github.com/PINTO0309/TensorflowLite-bin/tree/master/1.14.0
NP! I'll test and give you feedbacks!
Hi, I tested the aarch64 whl on 64-bit debian buster preview image of rpi3, and this time it behaves similarly to the 32-bit version before the fixing:
@PINTO0309 Hi, I tied you the whl you provide today and here are some findings:
- The
Interpretor
class doesn't haveset_num_threads
attribute.- but the runtime seems to be using 4 threads
- The speed of running mobilenet ssdlite model is slower than the full-sized tensorflow runtime with 4 threads (650 ms/img vs 600 ms/img), and adding thread with full-sized tensorflow runtime can improve the speed to 520 ms/img.
Does the tflite-runtime package include the improvment you did to tensorflow build?
set_num_threads
attribute:AttributeError: type object 'Interpreter' has no attribute 'set_num_threads'
Where do you think the problem is?
@PhilipXue Ah ... sorry. Perhaps I have repeated the same mistake. I can't work until I get home, so please be patient.
@PINTO0309 No need to be sorry. I heard that 64-bit system is better at floating-point calculation and just curious to verify it. Please take your time.
@PhilipXue I checked the contents of the wheel file. After all, I made the same mistake, so I modified the wheel file. https://github.com/PINTO0309/TensorflowLite-bin.git tflite_runtime-1.14.0-cp35-cp35m-linux_aarch64.whl tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl
By the way, there is also a 64-bit kernel OS image that I created independently. https://github.com/PINTO0309/RaspberryPi-bin.git
The benchmark tool you are using is cool. Is it possible for me to share it?
@PINTO0309 Hi, I will check when I go back to work on Monday!
The benchmark is very simple, it measures the processing time for each image. Through simple, the code is created for internal projects. You can use code provided in this article to do the exact same thing. (If you are referring the image that I put in the comment, it's htop
)
I can share the results measuring some publicly available models (mobilenet ssd/ssdlite), which should be enough to see the improvement made by your whl files.
@PhilipXue Thank you! What I wanted to know was htop. It's so cool. :smiley:
I'm glad you enjoyed it! If you are interested, there is another similar system monitoring tool called glances
that you may want to try out.
Have a nice day!
@PINTO0309 I've tested the new aarch64 version tflr whl. And the result is not ideal, the same model is far slower than running with 32-bit os or the full-sized 64-bit tensorflow runtime. I verified on another armv8 device and had the same result. Still, this result is improved over the previous version before fixing. I have no idea why this kind of difference occurs, the best guess I can make is that there are some optimization tensorflow doesn't include in the tflr runtime for aarch64.
It is a difficult problem. MultiThread may not be valid, but the official binaries below may be a little faster. https://dl.google.com/coral/python/tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl
@PINTO0309 Hi, Thanks for replying. I will it out. Since all your whls are compiled, I think we can close this issue. Thank you very much for all your swift response. We may discuss speed problem in a separate thread at TFLite runtime repo.
@PhilipXue @PINTO0309 Can you make a complete tutorial from start to finish, for benchmarking tensorflow lite model on raspberry pi. Honestly, I've been stuck here for 1 month, because my knowledge is still narrow about this. I am very thankful and grateful if you are willing to help or at least give advice, because all sources, I think I have tried.
That I have:
Hi, I appreciate a lot for your effort! There are ways to build tflite only python package which will reduce the package size tremendously: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/pip_package I tied it before and I don't think this runtime can utilize multiprocessing. Is there any chance that you can check this and build a tflite package that has multiprocessing ability?