google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
417 stars 125 forks source link

the output is always zero in case i used my retrained model which works well in tensroflow light #136

Closed A7med01 closed 4 years ago

A7med01 commented 4 years ago

the output is always zero in case i used my retrained model which works well in tensroflow light

for example mobilenet_v2_1.0_224_quant.tflite works in tesorflow lite but the same model file give zeros at coral tpu and aslo gives zeros after compiling it for the tpu

note : i used this method to retrian the model :

https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb

and i just run the code provided by google at this note book

python3 terrain_classification.py

----Labels loaded---- ----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. [Class(id=0, score=0.0), Class(id=1, score=0.0), Class(id=2, score=0.0), Class(id=3, score=0.0), Class(id=4, score=0.0)] 1145.2ms -------RESULTS-------- daisy: 0.00000 dandelion: 0.00000 roses: 0.00000 sunflowers: 0.00000 tulips: 0.00000

Namburger commented 4 years ago

model file give zeros at coral tpu and aslo gives zeros after compiling it for the tpu

Apologies, I'm just really confused by that. There are also so many models, would be nice if you could give a link to the model and some examples to duplicate. Can I see the exact code that you used to pre-process the inputs before feeding it to the model? Can you also check the model with our tflite classification example first,

A7med01 commented 4 years ago

model file give zeros at coral tpu and aslo gives zeros after compiling it for the tpu

Apologies, I'm just really confused by that. There are also so many models, would be nice if you could give a link to the model and some examples to duplicate. Can I see the exact code that you used to pre-process the inputs before feeding it to the model? Can you also check the model with our tflite classification example first,

I just run the code here changes nothing even the dataset: https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb

and then downloaded the model and run it using coral egde usb

your tflite classification example works well but if i just put the my retrianed model instead i got zeros

A7med01 commented 4 years ago

model file give zeros at coral tpu and aslo gives zeros after compiling it for the tpu

Apologies, I'm just really confused by that. There are also so many models, would be nice if you could give a link to the model and some examples to duplicate. Can I see the exact code that you used to pre-process the inputs before feeding it to the model? Can you also check the model with our tflite classification example first,

if i am not clear i can explain with more details Can i upload my downloaded model here for you to check to run on coral?

Namburger commented 4 years ago

@A7med01 Hi again, Just wanted to clarify so that we are on the same page: You are retraining an image using this colab and evaluating the model using this demo? I guess my confusion is that you mentioned #python3 terrain_classification.py so I just wanted to be sure we are on the same page before continue or else I may cause more confusion to you :)

Anyhow, here is what I believe happens: With tf2.x, these parameters are deprecated:

# These set the input and output tensors to uint8
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

So I/O tensors are still of type float, which means that if you are using our example on that tflite repo (which assumes that models has I/O tensors of type uint8), then you don't need to calculate the outputs with scale and zero point and just return the outputs at this point

A7med01 commented 4 years ago

yes i used this notebook and this demo i just changed the name and add print function to illustrate what happened

I will try these thanks you :) converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8

Namburger commented 4 years ago

@A7med01 Wait, that's not what I mean lol that colab uses tf2.x and those parameters are deprecated, so that won't work either. If you want to ensure that IO tensors are of type uint8, you can use this colab instead.

Otherwise, you should be able to use the same float IO model and make the change I suggested above.

Namburger commented 4 years ago

@A7med01 FYI, here is the exact change that you need to make for your model to work:

+++ b/python/examples/classification/classify.py
@@ -37,8 +37,9 @@ def output_tensor(interpreter):
   """Returns dequantized output tensor."""
   output_details = interpreter.get_output_details()[0]
   output_data = np.squeeze(interpreter.tensor(output_details['index'])())
-  scale, zero_point = output_details['quantization']
-  return scale * (output_data - zero_point)
+  return output_data
+  # scale, zero_point = output_details['quantization']
+  # return scale * (output_data - zero_point)

I packaged everything here including the change + the model from the original codlab you posted: retrainptq2.tar.gz

~/retrainptq2 » python3 classify_image.py \                                                                vunam@penguin
  --model ./mobilenet_v2_1.0_224_quant_edgetpu.tflite \
  --labels ./flower_labels.txt \
  --input ./tulip.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
13.4ms
6.9ms
6.9ms
6.1ms
6.5ms
-------RESULTS--------
tulips: 0.80859
A7med01 commented 4 years ago

@A7med01 FYI, here is the exact change that you need to make for your model to work:

+++ b/python/examples/classification/classify.py
@@ -37,8 +37,9 @@ def output_tensor(interpreter):
   """Returns dequantized output tensor."""
   output_details = interpreter.get_output_details()[0]
   output_data = np.squeeze(interpreter.tensor(output_details['index'])())
-  scale, zero_point = output_details['quantization']
-  return scale * (output_data - zero_point)
+  return output_data
+  # scale, zero_point = output_details['quantization']
+  # return scale * (output_data - zero_point)

I packaged everything here including the change + the model from the original codlab you posted: retrainptq2.tar.gz

~/retrainptq2 » python3 classify_image.py \                                                                vunam@penguin
  --model ./mobilenet_v2_1.0_224_quant_edgetpu.tflite \
  --labels ./flower_labels.txt \
  --input ./tulip.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
13.4ms
6.9ms
6.9ms
6.1ms
6.5ms
-------RESULTS--------
tulips: 0.80859

thank you that really works and i retrained my model now with my custom dataset

but now , the model works very well in colab notebook and give right prediction with test images unfortunately same model just run on coral TPU it works but with all wrong prediction

what do you think it is happening here , how same mode give wrong results on coral ? is it about coral and the 8 bit quantization so the images lose it's infromation or the code you added solve the problem but waste image data

i don't understand can you help me , i don't know what to do i just followed your steps

A7med01 commented 4 years ago

@Namburger

actually the model give same results for most images :

$ python3 terrain_classification.py
----Labels loaded----
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
1138.3ms
-------RESULTS--------
parkay: 0.99609
Ceramic_floor1: 0.00391
ahmed-zaitoon@Ahmed-Zaitoon:~/coral/tflite/python/examples/classification$ python3 terrain_classification.py
----Labels loaded----
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
1129.6ms
-------RESULTS--------
parkay: 0.99609
Ceramic_floor1: 0.00391

my input test images resolution = 1536 x 864 and when i cropped images manually i got different results but still wrong predictions what i am doing wrong here ??

i think resizing 1536 x 864 to 224x224 make images lose information at coral but same resizing when i tested at colab didn't affect classification accuracy

here is the code i use to test model at colab


import numpy as np
import cv2
import tensorflow as tf

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

%matplotlib inline

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="/content/mobilenet_v2_1.0_224_quant.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
print(input_shape)

def test_image(image_path):

  #input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)

  data = cv2.imread(image_path)
  data = cv2.resize(data, (224, 224))
  plt.imshow((mpimg.imread(image_path)))

  info = np.iinfo(data.dtype) # Get the information of the incoming image type
  data = data.astype(np.float64) / info.max # normalize the data to 0 - 1
  input_data = 255 * data # Now scale by 255
  input_data = data.astype(np.float32)

  print(input_data.shape)

  input_data = input_data.reshape(1,224,224,3)
  print(input_data.shape)

  interpreter.set_tensor(input_details[0]['index'], input_data)

  interpreter.invoke()

  # The function `get_tensor()` returns a copy of the tensor data.
  # Use `tensor()` in order to get a pointer to the tensor.
  output_data = interpreter.get_tensor(output_details[0]['index'])

  print(output_data)
  output_data = output_data.tolist()
  output_data = max(output_data)
  max_index = output_data.index(max(output_data)) 
  print(max_index)

  labels = ["Ceramic_floor1" , "Ceramic_floor2","Sidewalk" , "asphalt" , "garden","parkay","sand"]
  print(labels[max_index])
Namburger commented 4 years ago

@A7med01 A couple of things I want you to be aware of:

1) You are not running this on the edgetpu, by default you are using the tflite API which runs everything on the CPU. Please check this doc again, the proper usage is:

import tflite_runtime.interpreter as tflite
interpreter = tflite.Interpreter(model_path)
interpreter = tflite.Interpreter(model_path,
  experimental_delegates=[tflite.load_delegate('libedgetpu.so.1')])

2) This line:

  data = data.astype(np.float64) / info.max # normalize the data to 0 - 1

you are expanding the image to type float64, while the model is expecting float32. The tflite model will take the inputs in as a contiguous array and it seems to me that only 1/2 half of your inputs are being processed on the edgetpu and 1/4 garbage data with another 1/2 of your inputs won't even be processed. This is a user error. Since you know the code on colab works, why don't you try to understand it first?

If this is still not clear to you, please attach the model, I'll give it a quick spin :)

A7med01 commented 4 years ago

i used the code above to test on colab only and didn't use it with coral TPU

to test on coral TPU i extracted retrainptq2.tar.gz you send yesterday and just replace your model and labels file with mine and test with your code and the results is wrong as here image 3 and 4 give same wrong result

ahmed-zaitoon@Ahmed-Zaitoon:~/Downloads/retrainptq2$ python3 classify_image.py --model mobilenet_v2_1.0_224_quant_edgetpu.tflite --labels flower_labels.txt --input 3.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
85.5ms
17.2ms
14.0ms
14.8ms
14.2ms
-------RESULTS--------
parkay: 0.99609
ahmed-zaitoon@Ahmed-Zaitoon:~/Downloads/retrainptq2$ python3 classify_image.py --model mobilenet_v2_1.0_224_quant_edgetpu.tflite --labels flower_labels.txt --input 4.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
85.8ms
17.2ms
13.1ms
17.2ms
12.3ms
-------RESULTS--------
parkay: 0.99609

so i am sorry i don't understand what you said above because i didn't change any code used for coral TPU and the code for colab works so my problem with coral code

here is the model with images to use for testing

terrain.zip

please test it on coral TPU with images provided and also test it in colab using the code down here you will find it predicts right on colab and wrong on coral and please tell me what should i do to make it predicts right on coral TPU

import numpy as np
import cv2
import tensorflow as tf

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

%matplotlib inline

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="/content/mobilenet_v2_1.0_224_quant.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
print(input_shape)

def test_image(image_path):

  #input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)

  data = cv2.imread(image_path)
  data = cv2.resize(data, (224, 224))
  plt.imshow((mpimg.imread(image_path)))

  info = np.iinfo(data.dtype) # Get the information of the incoming image type
  data = data.astype(np.float64) / info.max # normalize the data to 0 - 1
  input_data = 255 * data # Now scale by 255
  input_data = data.astype(np.float32)

  print(input_data.shape)

  input_data = input_data.reshape(1,224,224,3)
  print(input_data.shape)

  interpreter.set_tensor(input_details[0]['index'], input_data)

  interpreter.invoke()

  # The function `get_tensor()` returns a copy of the tensor data.
  # Use `tensor()` in order to get a pointer to the tensor.
  output_data = interpreter.get_tensor(output_details[0]['index'])

  print(output_data)
  output_data = output_data.tolist()
  output_data = max(output_data)
  max_index = output_data.index(max(output_data)) 
  print(max_index)

  labels = ["Ceramic_floor1" , "Ceramic_floor2","Sidewalk" , "asphalt" , "garden","parkay","sand"]
  print(labels[max_index])

test_image('/content/1.jpg')

waiting for your response and sorry if i can't understand well

Namburger commented 4 years ago

@A7med01 Could you also include your CPU tflite model, I'll take a look at both

A7med01 commented 4 years ago

oh sorry i forgot it , sure you need it here is the two model files terrain.zip

Namburger commented 4 years ago

@A7med01 I'm really sorry but I just don't get what you mean by not working on coral but is working on colab. When I ran both of your models, I'm getting the same result for both the CPU and the edgetpu model so I'm suspecting the model just give that output. FYI, we just updated the colab to tf2.3 which will have inputs at uint8 now if you want to try it again. You should be able to use this exact code to run it with the new colab.

A7med01 commented 4 years ago

@A7med01 I'm really sorry but I just don't get what you mean by not working on coral but is working on colab. When I ran both of your models, I'm getting the same result for both the CPU and the edgetpu model so I'm suspecting the model just give that output. FYI, we just updated the colab to tf2.3 which will have inputs at uint8 now if you want to try it again. You should be able to use this exact code to run it with the new colab.

how i use this exact code on colab i think it used only on coral TPU?

you got same results for CPU and the edgetpu
for me i get good results on cpu but with coral TPU it gives different wrong predictions

test image 44.jpg with coral TPU that's what i got wrong prediction Screenshot from 2020-06-10 11-19-53

and when i tested on colab with code (written in last comment )

Screenshot from 2020-06-10 11-20-59

so ,can you please run the model i sent on coral TPU and test it with image 44.jpg i sent and show me the results you got image - 44.zip

and also test it on cpu with code for colab (written in last comment ) and show me the results

i feel that i understand the problem here : when inputs be at uint8 it loses information and that's why i got bad prediction with coral TPU and good with cpu or colab as i did't pu it in unit8

and when you test on cpu you aslo put inputs in unit8 so you see the same bad results as coral TPU

and i am sorry for all these long conversation , i really have good model but i can't use with coral TPU

A7med01 commented 4 years ago

@A7med01 I'm really sorry but I just don't get what you mean by not working on coral but is working on colab. When I ran both of your models, I'm getting the same result for both the CPU and the edgetpu model so I'm suspecting the model just give that output. FYI, we just updated the colab to tf2.3 which will have inputs at uint8 now if you want to try it again. You should be able to use this exact code to run it with the new colab.

I used the new Colab TF 2.3 now and It solves my problem thank you i think model now gives right prediction

thank you for your effort with me

Namburger commented 4 years ago

@A7med01 Awesome, glad to help!