Inference time - Githubissues

AkkiSony commented 3 years ago

Description

I am able to run the model on Edge TPU, however I just have small question with respect to the inference time.

----INFERENCE TIME---- Note: The first inference is slow because it includes loading the model into Edge TPU memory. 154.32 ms 24.17 ms 23.97 ms 21.98 ms 27.74 ms

The first inference is with loading the model and the rest of the inference time is slow as the model is already loaded into the edge tpu right?

I came across the snippet which calculates the inference time:

for _ in range(args.count): start = time.perf_counter() interpreter.invoke() inference_time = time.perf_counter() - start objs = detect.get_objects(interpreter, args.threshold, scale) print('%.2f ms' % (inference_time * 1000))

My question is how does the code avoid to load the model from the second loop onwards?

Just a small simple query I wanted to get clarified. Thanks in advance! :)

manoj7410 commented 3 years ago

@AkkiSony Can you share the link of this snippet ?

AkkiSony commented 3 years ago

@manoj7410 https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py#L84

manoj7410 commented 3 years ago

@AkkiSony Model has already been loaded at https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py#L73 Which has been done only once.

AkkiSony commented 3 years ago

But how is it possible that the first inference time is greater when the code is executed here https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py#L84 even though the loading of the model has already taken place before?

manoj7410 commented 3 years ago

When you compile models individually, the compiler gives each model a unique "caching token" (a 64-bit number). Then when you execute a model, the Edge TPU runtime compares that caching token to the token of the data that's currently cached. If the tokens match, the runtime uses that cached data. If they don't match, it wipes the cache and writes the new model's data instead. (When models are compiled individually, only one model at a time can cache its data.) . So, as long as the same cached data is being used, the inference speed is faster.

You can read more about it at https://coral.ai/docs/edgetpu/compiler/#parameter-data-caching

AkkiSony commented 3 years ago

@manoj7410 That was a little complicated for me to understand. However, thank you for your support! :)

But in this marked line (https://github.com/guichristmann/edge-tpu-tiny-yolo/blob/master/inference.py#L60), I get an inferernce time of 1600ms. This sounds strange because, inference time cannot be 1600ms for a yolov3 model with coral tpu attcahed.

I would like to know the inference time, similar to that of 'https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py#L84'. But I am not understanding where to modify.

AkkiSony commented 3 years ago

154.32 ms 24.17 ms 23.97 ms 21.98 ms 27.74 ms

Inference time is printed 5 times here, does that mean inference is run on that image for 5 times? If so, why are we running it 5 times?

manoj7410 commented 3 years ago

@AkkiSony If you see the value of --count at https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py#L68, the default value is 5. That is why the inference is being executed for 5 times.

AkkiSony commented 3 years ago

Yes, I understood the default value is 5. But why is it executed 5 times? Is there any reason?

Do you mean to say that 'Interpreter.invoke' in the for loop takes more time to load only for the first iteration and further on it doesn't take much time? I am confused of this behaviour!

manoj7410 commented 3 years ago

Do you mean to say that 'Interpreter.invoke' in the for loop takes more time to load only for the first iteration and further on it doesn't take much time? I am confused of this behaviour!

Yes. This is the correct behavior. During the first inference, parameter caching takes place as explained at https://github.com/google-coral/edgetpu/issues/451#issuecomment-905294024

AkkiSony commented 3 years ago

If just running a for loop for 5 times with "Interpreter.invoke" changes the inference time in detect.py script, then with my code, I have written a for loop just like detect.py example, invoking interpeter 5 times.

But I get the following output, there is no major drop in the inference time at all. Is there any reason?

for _ in range(5):

    start1 = time()
    # Run model
    interpreter.invoke()

    inf_time1 = time() - start1
    print(f"Net forward-pass time: {inf_time1*1000} ms.")

----------- OUTPUT --------------- Net forward-pass time: 1672.5153923034668 ms. Net forward-pass time: 1490.8123016357422 ms. Net forward-pass time: 1488.2874488830566 ms. Net forward-pass time: 1492.96236038208 ms. Net forward-pass time: 1503.197193145752 ms.

manoj7410 commented 3 years ago

@AkkiSony Is your model completely mapped to the TPU ?

AkkiSony commented 3 years ago

@manoj7410 coral-TPU-error-29

Yes, it is all mapped! :)

manoj7410 commented 3 years ago

@AkkiSony Okay. Can you share your uncompiled model here ?

AkkiSony commented 3 years ago

@manoj7410 https://drive.google.com/drive/folders/181npG1SDJnMBBQOm_gXbL0XguSA4gGsy?usp=sharing I converted from Yolov3 -> Keras -> Tensorflow lite -> edgetpu compiled

You can find all the models in the shared link. I think you were asking me for "quant_model.tflite"

AkkiSony commented 3 years ago

@hjonnala Hello :) Can you please help me solve this issue with inference time? Thanks in advance.

hjonnala commented 3 years ago

Inference time also depends on model size and the operation involved. In your case due to large model size (62.5 MB >>> 8 MB) you might not see much difference in inference time.

On-chip memory used for caching model parameters: 7.06MiB Off-chip memory used for streaming uncached model parameters: 51.93MiB

AkkiSony commented 3 years ago

Hi, But i think it is really not feasible that I am getting inference time of 1600ms as shown below for a single input image.

----------- OUTPUT --------------- Net forward-pass time: 1672.5153923034668 ms.

The original darknet model is even bigger in size and there I get an inference of 300ms without Coral USB accelerator.

I would like to measure the inference time of the image after the model is being loaded into the coral usb accelerator memory. I am actually doing a benchmark test with VIM3 board which has NPU (Here inference time was 80ms for an image).

So it very important for me to have a comparable value between the two hardware accelerator.

I want to measure inference time on this script. (https://github.com/guichristmann/edge-tpu-tiny-yolo/blob/master/inference.py#L60)

AkkiSony commented 3 years ago

On the other hand, i want to visualize the output layer of the model attached. I tried to use the netron app, yet. I got confused to see the output layer of this MbileNetv1SSD model.

https://drive.google.com/drive/folders/1pq8DTzc_b4CWIf65Y26CIzxQxg_WoL2c?usp=sharing

hjonnala commented 3 years ago

I have tested inference time with detet_image.py. Here are the results for Edgetpu vs CPU models.

If you wanna try this, please comment this line https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py#L87

 python3 examples/detect_image.py   --model /home/hemanth_scripts/models/quant_model_edgetpu.tflite   --labels test_data/coco_labels.txt   --input /home/Downloads/holes-test-input.jpg   --count 5
----INFERENCE TIME----
Note: The first inference is slow because it includes loading the model into Edge TPU memory.
154.76 ms
146.21 ms
143.52 ms
142.68 ms
150.04 ms

 python3 examples/detect_image.py   --model /home/hemanth_scripts/models/quant_model.tflite   --labels test_data/coco_labels.txt   --input /home/Downloads/holes-test-input.jpg   --count 5
----INFERENCE TIME----
Note: The first inference is slow because it includes loading the model into Edge TPU memory.
1409.11 ms
1316.50 ms
1335.17 ms
1333.39 ms
1336.28 ms

AkkiSony commented 3 years ago

Okay, so by doing the above steps, we basically are not doing any object detection, but just getting to know the time required for the inference right? But since, we commment that line 87, where exactly is the inference happening? Since we do not get any objject detections on the input image?

AkkiSony commented 3 years ago

154.76 ms 146.21 ms 143.52 ms 142.68 ms 150.04 ms

I am really not sure if this is right, because, the inference time has to reduce the second count onwards as the model is already loaded into the coral memory right?

AkkiSony commented 3 years ago

@hjonnala When I did it with 'quant_model_edgetpu.tflite' model, I was getting an inference time of: ----INFERENCE TIME---- Note: The first inference is slow because it includes loading the model into Edge TPU memory. 1691.73 ms 1490.07 ms 1492.52 ms 1486.94 ms 1500.40 ms

hjonnala commented 3 years ago

are you using detect_image.py to run the inference? detect_image.py and this script (https://github.com/guichristmann/edge-tpu-tiny-yolo/blob/master/inference.py#L60) use different tflite runtime packages..

AkkiSony commented 3 years ago

are you using detect_image.py to run the inference?

Yes, I used detect_image.py to run the script. Yet, teh inferece time for me is higher. coral-TPU-error-30

use different tflite runtime packages..

What do you mean by this? I sm sorry, I dont understand. Becasue, I am using the same tflite_runtime version for both. Can you please explain?

hjonnala commented 3 years ago

How much time it is taking for quant_model.tflite? Inference of 300ms is it coming from the same machine? can you share some screenshots for it? Please try to run the model connecting USB accelerator to USB 3.0 port.

AkkiSony commented 3 years ago

The accelerator is connected to USB 3.0. Things all worked fine when I had trained a model with MobileNet with TF. I had an inference time of 18ms.

I am getting problem only with this model. Please find the attached sceenshot. (Please note that in order to be sure, I downloaded the model from google drive link that I shared and hence I have renamed the file appened with '_gdrive'. I just had to tell this to be clear it is the same model which you had run on your PC.)

My inference time with uncompiled version has an inference time of '173103.72 ms' I was shocked to see this value.

coral-TPU-error-31

hjonnala commented 3 years ago

Hi,

USB accelerator is primarily designed for prototype the models. With USB accelerator Inference speeds would differ based on host system and whether you're using USB 2.0 or 3.0.
Inference time also depends on model size and the operations involved. So, we can't expect the same inference time for each model. (mobilenet vs yolov3)

AkkiSony commented 3 years ago

Hi, @hjonnala

Thank you for the clarity. I would just like to know one thing which is suspicious to me.

python3 examples/detect_image.py --model /home/hemanth_scripts/models/quant_model_edgetpu.tflite --labels test_data/coco_labels.txt --input /home/Downloads/holes-test-input.jpg --count 5 ----INFERENCE TIME---- Note: The first inference is slow because it includes loading the model into Edge TPU memory. 154.76 ms 146.21 ms 143.52 ms 142.68 ms 150.04 ms

How did you get this inference time so less, when I am also using the same model attached to a USB port 3. (refer above)

----INFERENCE TIME---- Note: The first inference is slow because it includes loading the model into Edge TPU memory. 1691.73 ms 1490.07 ms 1492.52 ms 1486.94 ms 1500.40 ms

My output can be seen as above. I am just curious about it.

hjonnala commented 3 years ago

Its because I am using Linux machine with Intel core I7, Which are different from your host machine.

AkkiSony commented 3 years ago

@hjonnala This is the system configuration, I am using. (Windows 10)

Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz 2.90 GHz

I think this is a quiet goof processor too. Just windows 10 and linux should not have this much variation in inference time, I hope.

hemanthreddyjonnala commented 3 years ago

Hi, I didn't remember when installing edgetpu runtime whether I chose y or n. (https://coral.ai/docs/accelerator/get-started#runtime-on-windows). It would also impact the inference time. I have tested the model on my Windows machine with dummy input array. It has Intel Core I5 and the results are:

(pycoral_venv) C:\Users\heman\pycoral_venv\Scripts>py C:\\Users\\heman\\Downloads\\run_inference.py
quant_model.tflite
362435.30 ms
309964.27 ms
407157.52 ms
404263.39 ms
302495.33 ms
C:\Users\heman\Downloads\quant_model.tflite 1
quant_model_edgetpu.tflite
326.26 ms
189.49 ms
197.24 ms
192.72 ms
211.24 ms
C:\Users\heman\Downloads\quant_model_edgetpu.tflite 1

hjonnala commented 3 years ago

I would suggest to focus on testing the accuracy of the model with USB accelerators and inference time on the devices where it is going to be deployed.

AkkiSony commented 3 years ago

@hemanthreddyjonnala Thank you fir taking it home and investigating. During your investigation at home, did you install edge tpu runtime with high frequency?

I would suggest to focus on testing the accuracy of the model with USB accelerators

How di I test teh accuracy of the model on USB accelerator? I am sure there is some loss during the conversion as the score value for the same image is different. There is a depletion in the score value, when I did did inference using USB accelerator.

AkkiSony commented 3 years ago

C:\Users\heman\Downloads\quant_model.tflite 1 quant_model_edgetpu.tflite 326.26 ms 189.49 ms 197.24 ms 192.72 ms 211.24 ms

I do not know why am I not getting this inference time with quant_model_edgetpu.tflite. My inference time is way higher as I said before. :/

AkkiSony commented 3 years ago

inference time on the devices where it is going to be deployed.

As I had told earlier, I am just doing an investigation on different hardware acceleators. Hence, as of now there is no scope of deployment. I need to get results of the different hardware accelerators. So inference time is one parameter, I am measuring. I do not know how to measure the accuracy of the model after conversion. Do you have any suggestions?
Also, what other parameters can be measured to compare the accelerators?

AkkiSony commented 3 years ago

@hjonnala I wanted to measure the infrence time taken by the model. But I am confused where to actually calculate the inference time. I have attached the main() function of my program. Can you please let me know where do I have to calculate the inference time please? Thanks in advance. :) Inference-time-1 With the current inference time, I am getting an output of 2500ms. I am sure this is wrong. Any suggestion please?

hjonnala commented 3 years ago

I am not sure where interpreter->Invoke() is happening in your code. Try to get the time taken for interpreter->Invoke()

AkkiSony commented 3 years ago

@hemanthreddyjonnala Thanks for answering. The training accuracy: 98% Validation accuracy: 85%

These are the values that I got during the training of the model.

How can I measure the accuracy on the coral TPU? I just wanto check how is the accuracy of the model when it is running with coral USB.

hjonnala commented 3 years ago

On single image you can use https://github.com/google-coral/pycoral/blob/master/examples/detect_image.py to check the accuracy of detected objects. I don't think so we have any examples for checking accuracy on bulk data. Also, currently we don't have any example scripts to run inference on yolov3 models.

AkkiSony commented 3 years ago

to check the accuracy of detected objects

So do you mean by just measuring the score values? @hjonnala Please let me know your answer.

hjonnala commented 3 years ago

You can compare the number of detentions cpu tflite vs edgetpu tflite as well as the individual object scores. How are you calculating The training accuracy: 98%?

AkkiSony commented 3 years ago

Training and Validation accuracy:

acc = history.history['acc'] val_acc = history.history['val_acc']

loss = history.history['loss'] val_loss = history.history['val_loss']

plt.figure(figsize=(8, 8)) plt.subplot(2, 1, 1) plt.plot(acc, label='Training Accuracy') plt.plot(val_acc, label='Validation Accuracy') plt.legend(loc='lower right') plt.ylabel('Accuracy') plt.ylim([min(plt.ylim()),1]) plt.title('Training and Validation Accuracy')

hjonnala commented 2 years ago

Feel free to reopen if you still have any questions.

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

AkkiSony commented 2 years ago

@hjonnala I wanted to know how can I measure the power consumption of the TPU? Just wanted to compare this with different accelerators.

hjonnala commented 2 years ago

The Edge TPU is capable of performing 4 trillion operations (tera-operations) per second (TOPS), using 0.5 watts for each TOPS (2 TOPS per watt).

Please check this paper for additional reference: https://workshops.inf.ed.ac.uk/accml/papers/2020/AccML_2020_paper_4.pdf

Thanks

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

AkkiSony commented 2 years ago

@hjonnala Is there a way to generate graph, which shows the cpu usage for high frequency and normal frequency?

manoj7410 commented 2 years ago

@AkkiSony We haven't created such tool to monitor the CPU usage however, you can collect the data using the tool like 'pidstat' and then plot that data on a graph.

google-coral / edgetpu

Inference time #451

Description