PINTO0309 / Tensorflow-bin

Prebuilt binary with Tensorflow Lite enabled. For RaspberryPi / Jetson Nano. Support for custom operations in MediaPipe. XNNPACK, XNNPACK Multi-Threads, FlexDelegate.
https://qiita.com/PINTO
Apache License 2.0
500 stars 113 forks source link

tflite only python package #15

Closed PhilipXue closed 5 years ago

PhilipXue commented 5 years ago

Hi, I appreciate a lot for your effort! There are ways to build tflite only python package which will reduce the package size tremendously: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/pip_package I tied it before and I don't think this runtime can utilize multiprocessing. Is there any chance that you can check this and build a tflite package that has multiprocessing ability?

PINTO0309 commented 5 years ago

@PhilipXue I tried to create it, so please try it. I have not verified the behavior yet. Please let me know if it worked. https://github.com/PINTO0309/TensorflowLite-bin.git

PhilipXue commented 5 years ago

@PINTO0309 Wow that's amazing! I'll try it immediately!

PhilipXue commented 5 years ago

@PINTO0309 Hi, I tied you the whl you provide today and here are some findings:

Does the tflite-runtime package include the improvment you did to tensorflow build?

PINTO0309 commented 5 years ago

@PhilipXue I'm sorry. I noticed a mistake in correcting the program. I will fix it today. please wait a moment. I have identified the cause.

PINTO0309 commented 5 years ago

@PhilipXue Fixed and recommitted. tflite_runtime-1.14.0-cp35-cp35m-linux_armv7l.whl tflite_runtime-1.14.0-cp37-cp37m-linux_armv7l.whl

https://github.com/PINTO0309/TensorflowLite-bin.git

PhilipXue commented 5 years ago

@PINTO0309 Thanks for your swift response! I tried out the new whl file and it works like a charm (multithread performance improved over full-sized tf runtime)! Thanks a lot for your effort! BTW, Do you have any plan to build tflr whls for 64-bit debian buster (python3.7 aarch64)?

PINTO0309 commented 5 years ago

@PhilipXue It is complete. The working time is about 5 minutes. However, I don't have a device that can be tested. tflite_runtime-1.14.0-cp35-cp35m-linux_aarch64.whl tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl

https://github.com/PINTO0309/TensorflowLite-bin/tree/master/1.14.0

PhilipXue commented 5 years ago

NP! I'll test and give you feedbacks!

PhilipXue commented 5 years ago

Hi, I tested the aarch64 whl on 64-bit debian buster preview image of rpi3, and this time it behaves similarly to the 32-bit version before the fixing:

@PINTO0309 Hi, I tied you the whl you provide today and here are some findings:

  • The Interpretor class doesn't have set_num_threads attribute.
  • but the runtime seems to be using 4 threads pi_tflr
  • The speed of running mobilenet ssdlite model is slower than the full-sized tensorflow runtime with 4 threads (650 ms/img vs 600 ms/img), and adding thread with full-sized tensorflow runtime can improve the speed to 520 ms/img.

Does the tflite-runtime package include the improvment you did to tensorflow build?

Where do you think the problem is?

PINTO0309 commented 5 years ago

@PhilipXue Ah ... sorry. Perhaps I have repeated the same mistake. I can't work until I get home, so please be patient.

PhilipXue commented 5 years ago

@PINTO0309 No need to be sorry. I heard that 64-bit system is better at floating-point calculation and just curious to verify it. Please take your time.

PINTO0309 commented 5 years ago

@PhilipXue I checked the contents of the wheel file. After all, I made the same mistake, so I modified the wheel file. https://github.com/PINTO0309/TensorflowLite-bin.git tflite_runtime-1.14.0-cp35-cp35m-linux_aarch64.whl tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl

By the way, there is also a 64-bit kernel OS image that I created independently. https://github.com/PINTO0309/RaspberryPi-bin.git

The benchmark tool you are using is cool. Is it possible for me to share it?

PhilipXue commented 5 years ago

@PINTO0309 Hi, I will check when I go back to work on Monday!

The benchmark is very simple, it measures the processing time for each image. Through simple, the code is created for internal projects. You can use code provided in this article to do the exact same thing. (If you are referring the image that I put in the comment, it's htop) I can share the results measuring some publicly available models (mobilenet ssd/ssdlite), which should be enough to see the improvement made by your whl files.

PINTO0309 commented 5 years ago

@PhilipXue Thank you! What I wanted to know was htop. It's so cool. :smiley: Screenshot 2019-09-09 00:21:36

PhilipXue commented 5 years ago

I'm glad you enjoyed it! If you are interested, there is another similar system monitoring tool called glances that you may want to try out.

Have a nice day!

PhilipXue commented 5 years ago

@PINTO0309 I've tested the new aarch64 version tflr whl. And the result is not ideal, the same model is far slower than running with 32-bit os or the full-sized 64-bit tensorflow runtime. I verified on another armv8 device and had the same result. Still, this result is improved over the previous version before fixing. I have no idea why this kind of difference occurs, the best guess I can make is that there are some optimization tensorflow doesn't include in the tflr runtime for aarch64.

PINTO0309 commented 5 years ago

It is a difficult problem. MultiThread may not be valid, but the official binaries below may be a little faster. https://dl.google.com/coral/python/tflite_runtime-1.14.0-cp37-cp37m-linux_aarch64.whl

PhilipXue commented 5 years ago

@PINTO0309 Hi, Thanks for replying. I will it out. Since all your whls are compiled, I think we can close this issue. Thank you very much for all your swift response. We may discuss speed problem in a separate thread at TFLite runtime repo.

krisnadn11 commented 5 months ago

@PhilipXue @PINTO0309 Can you make a complete tutorial from start to finish, for benchmarking tensorflow lite model on raspberry pi. Honestly, I've been stuck here for 1 month, because my knowledge is still narrow about this. I am very thankful and grateful if you are willing to help or at least give advice, because all sources, I think I have tried.

That I have:

  1. detect.tflite
  2. labelmap.txt
  3. Raspberry PI 4B (aarch64, python 3.9)