DeeplabV3+ speed in Windows10 is 100ms

laibilly commented 4 years ago

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 (1809) Mobile device (e.g., Pixel 4, Samsung Galaxy 10) if the issue happens on mobile device: Coral TPU Usb Accerator with USB3.0 TensorFlow installed from (source or binary): pip install https://dl.google.com/coral/python/tflite_runtime-2.1.0.post1-cp36-cp36m-win_amd64.whl

TensorFlow version (use command below):tflite 2.1 Python version:3.6 Bazel version (if compiling from source): GCC/Compiler version (if compiling from source): CUDA/cuDNN version: N/A GPU model and memory: N/A Please provide the entire URL of the model you are using? https://github.com/google-coral/edgetpu/raw/master/test_data/deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite

Describe the current behavior Good Evening,

I had just got the coral usb accelerator. I am trying to run it in windows 10 with 8850h cpu. I had setup the coral speed to max as I got same speed in the classify_image.py sample I repeat follow code for 200 time in the semantic_segmetation.py sample

engine.run_inference(img) and notice it is now cpu bounded. A single core raise to 100% The execution time is around 100ms.

Is there anything I can do to improve the speed? I cannot switch to Linux as our application is bounded to window now.

Can I use TensorFlow C++ in windows 10? Will it lower the overhead in python?

Is this overhead speed expected? (50ms vs 100ms)

Is there a image segmentation model is faster for edgeTpu?

Run two tflite.Interpreter instance over single coral usb accelerator?

Run multiply usb accelerator?

Describe the expected behavior the speed should be around 50ms

Code to reproduce the issue Follow the python example code https://github.com/google-coral/edgetpu/blob/master/scripts/run_python_examples.sh

-Thanks Billy

Other info / logs

See above

Namburger commented 4 years ago

@laibilly Hello - I don't think there should be that much difference between c++ vs python since our python api is actually just a wrapper of the c++ code (the overhead will only be the translating swig code). Could you try running the basic_engine_benchmarks.py script instead of manual timing. To add the deeplab models to the benchmarks, you can just add these two lines to the basic_engine_reference_x86_64.csv file:

+ deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite 50
+ deeplabv3_mnv2_pascal_quant_edgetpu.tflite 50

Could you also share how you are timing your inference? I think python's time library can yield different results for different platform (i suggest using time.perf_counter() for window).

FYI:

$ PYTHONPATH=$(pwd) python3 benchmarks/basic_engine_benchmarks.py
-------------- Model 1 / 8 ---------------
Benchmark for [deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite]
model path = /usr/local/google/home/vunam/workspace/edgetpu.git/test_data/deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite
input tensor shape = [  1 513 513   3]
36.59 ms (iterations = 200)
-------------- Model 2 / 8 ---------------
Benchmark for [deeplabv3_mnv2_pascal_quant_edgetpu.tflite]
model path = /usr/local/google/home/vunam/workspace/edgetpu.git/test_data/deeplabv3_mnv2_pascal_quant_edgetpu.tflite
input tensor shape = [  1 513 513   3]
42.35 ms (iterations = 200)

laibilly commented 4 years ago

Thanks for your reply I had done following steps to setup the test environment

conda create --name edgetpu python=3.6
git clone https://github.com/google-coral/edgetpu.git
pip install https://dl.google.com/coral/python/tflite_runtime-2.1.0.post1-cp36-cp36m-win_amd64.whl
pip install Pillow
pip install https://dl.google.com/coral/edgetpu_api/edgetpu-2.14.0-cp36-cp36m-win_amd64.whl

I don't know why my machine_info() return AMD64, I had add the two line

deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite 50
deeplabv3_mnv2_pascal_quant_edgetpu.tflite 50

to basic_engine_reference_x86_64.csv and make a new copy as basic_engine_reference_AMD64.csv After that I can run python benchmarks/basic_engine_benchmarks.py without any problem Before that it would report basic_engine_reference_AMD64.csv is not find

I had attached a screen recording for your reference https://www.youtube.com/watch?v=ItRMRgf-WUs

Here is the test result

(edgetpu) C:\workspace\edgetpu>python benchmarks/basic_engine_benchmarks.py
-------------- Model 1 / 8 ---------------
Benchmark for [deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite
input tensor shape = [  1 513 513   3]
100.37 ms (iterations = 200)
-------------- Model 2 / 8 ---------------
Benchmark for [deeplabv3_mnv2_pascal_quant_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\deeplabv3_mnv2_pascal_quant_edgetpu.tflite
input tensor shape = [  1 513 513   3]
104.89 ms (iterations = 200)
-------------- Model 3 / 8 ---------------
Benchmark for [inception_v1_224_quant_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\inception_v1_224_quant_edgetpu.tflite
input tensor shape = [  1 224 224   3]
3.17 ms (iterations = 200)
-------------- Model 4 / 8 ---------------
Benchmark for [inception_v4_299_quant_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\inception_v4_299_quant_edgetpu.tflite
input tensor shape = [  1 299 299   3]
86.31 ms (iterations = 200)
-------------- Model 5 / 8 ---------------
Benchmark for [mobilenet_v1_1.0_224_quant_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\mobilenet_v1_1.0_224_quant_edgetpu.tflite
input tensor shape = [  1 224 224   3]
2.32 ms (iterations = 200)
-------------- Model 6 / 8 ---------------
Benchmark for [mobilenet_v2_1.0_224_quant_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\mobilenet_v2_1.0_224_quant_edgetpu.tflite
input tensor shape = [  1 224 224   3]
2.54 ms (iterations = 200)
-------------- Model 7 / 8 ---------------
Benchmark for [ssd_mobilenet_v1_coco_quant_postprocess_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\ssd_mobilenet_v1_coco_quant_postprocess_edgetpu.tflite
input tensor shape = [  1 300 300   3]
12.49 ms (iterations = 200)
-------------- Model 8 / 8 ---------------
Benchmark for [ssd_mobilenet_v2_face_quant_postprocess_edgetpu.tflite]
model path = C:\workspace\edgetpu\test_data\ssd_mobilenet_v2_face_quant_postprocess_edgetpu.tflite
input tensor shape = [  1 320 320   3]
5.72 ms (iterations = 200)
basic_engine_benchmarks_AMD64_20200515-185951.csv  saved!
******************** Check results *********************
 * Unexpected high latency! [deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite]
   Inference time: 100.366293 ms  Reference time: 50.0 ms
 * Unexpected high latency! [deeplabv3_mnv2_pascal_quant_edgetpu.tflite]
   Inference time: 104.89381299999997 ms  Reference time: 50.0 ms
******************** Check finished! *******************

(edgetpu) C:\workspace\edgetpu>

previously I notice basic_engine.py had a latency reading at line 137 So I basically modify semantic_segmetation.py in line 119 from

_, raw_result = engine.run_inference(input_tensor)

into

for _ in range(500):
    latency, raw_result = engine.run_inference(input_tensor)
    print('latency %.1fms' % (latency ))

Namburger commented 4 years ago

@laibilly Thanks for the details! I have an idea, but I may need your assistance, since I'm not great with Window. Basically, I need the md5sum hash of your libedgetpu.dll file. It should be something like this:

Get-FileHash c:\windows\system32\edgetpu.dll MD5 | Format-List

laibilly commented 4 years ago

Hi,

I seem don't have a libedgetpu.dll but I have a edgetpu.dll I create the md5 hash using a online service at http://onlinemd5.com/ it show 069C864B0975C613C229A5979B5DDC3D

For your reference I uploaded my edgetpu.dll to https://drive.google.com/drive/folders/1lFIPMnghkNypoKbHcJoNnE-5Ey63z8yZ

-Thanks Billy

Namburger commented 4 years ago

@laibilly Thank you! You are definitely running the max frequencies for our latest runtime release. I'll have to check with my team to see whether or not if this is a known platform limitation or not, will keep you updated.

laibilly commented 4 years ago

will the EdgeTPU-DeepLab-slim in the TensorFlow DeepLab Model Zoo yield a better result?

Do you have a pre-compile EdgeTPU-DeepLab-slim model for that?

I had tried put the some pre-compile EdgeTPU-DeepLab-slim model by others But they seem perform worst in edgetpu then the default model?

Which model do you suggest would archive 30fps in Windows?

Namburger commented 4 years ago

@laibilly Apologies for this issue, I have a colleague who can try to reproduce this issue on his window machine, will let you know what we can find. It seems that there is an issue with the deeplab slim model that is being worked on right now but could you try our recently released unet model?

Namburger commented 4 years ago

@laibilly by means of an update, I've spoke with our team and realizes that we haven't fully started benchmarking on window platform since the libraries supports for window/mac are quite new. We are now actively working on this and we won't be able to tell if this is expected or not until we get some results and then work on optimization solution. One thing to confirms that it's the edgetpu.dll maybe the reasons for this latency is to maybe try running it in a linux docker? This method is a little sketchy, because I don't know how the linux docker image would interact with window + we don't officially supports docker, but in theory, this should works. If you are willing to try:

1) Create a Dockerfile with this content:

FROM tensorflow/tensorflow:1.15.0-py3

WORKDIR /home
ENV HOME /home
VOLUME /data
EXPOSE 8888
RUN cd ~
RUN apt-get update
RUN apt-get install -y git nano python-pip python-dev pkg-config wget usbutils

RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" \
| tee /etc/apt/sources.list.d/coral-edgetpu.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y libedgetpu1-std python3-edgetpu edgetpu-examples

RUN mkdir coral && cd coral
RUN git clone https://github.com/google-coral/edgetpu.git

2) Build:

docker build -t "coral-edgetpu" .

3) Run:

docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb coral-edgetpu /bin/bash

4) Benchmarks:

echo "deeplabv3_mnv2_dm05_pascal_quant_edgetpu.tflite 50" > benchmarks/reference/basic_engine_reference_x86_64.csv
echo "deeplabv3_mnv2_pascal_quant_edgetpu.tflite 50" >> benchmarks/reference/basic_engine_reference_x86_64.csv
python3 benchmarks/basic_engine_benchmarks.py

My reasonable for this attempts is that you'll be using our well tested libedgetpu.so for linux platform instead of the window .dll. If you are still getting the same results, then there must be something else in window causing this latency, otherwise, then the latencies are coming from our library.

In either case, we are working on some of these numbers for window also, I can give you updates on that once we have it.

laibilly commented 4 years ago

Thanks for your update, will test this method next week Our application is a virtual background in wedding photobooth. We will push for some demo over this weekend in HongKong Wedding fair. Here some demo video for your reference 😁 https://youtu.be/yc1L_Woba5s

Namburger commented 4 years ago

@laibilly Wow, that's pretty kool to see our application being used in the wedding industry! Unfortunately I can't open that link as it is private!

laibilly commented 4 years ago

oops sorry for the setting , now should be okay https://youtu.be/yc1L_Woba5s

laibilly commented 4 years ago

Hi I had tried your method the point I get stuck is at docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb coral-edgetpu /bin/bash In windows usb pass-through to linux container seem is not a resolved issues ref1
ref2 ref3 I had tried both virtualbox & vmware. Vmware show some progress however the usb devices still didn't appear in the linux container I will try the unet model -Thanks Billy

hjonnala commented 3 years ago

@laibilly are you still having issues?

hjonnala commented 3 years ago

Closing due to lack of activity. Feel free to reopen if issue still exists.

google-coral-bot[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

google-coral / edgetpu

DeeplabV3+ speed in Windows10 is 100ms #113