Using your tflite model instead of mediapipe

Issue Type

Support

OS

Ubuntu

OS architecture

aarch64

Programming Language

Python

Framework

TensorFlowLite

Model name and Weights/Checkpoints URL

033_Hand_Detection_and_Tracking/21_new_full_lite/model_float32.tflite

Description

Hi Katsuya, I want to move from the mediapipe example (which works great) https://github.com/Kazuhito00/hand-gesture-recognition-using-mediapipe to a tensorflow (lite) example.

Reason: The demo should run on a FPGA CNN accelerator. I wanted to use your models as you have already quantized models which should ease the migration. When looking into the float model itself using https://netron.app/ than I see that the input and output layers are different to those from the google tflite model. So the input is float32[1,192,192,3] while the mediapipe one is float32[1,224,224,3] Also the output layer is different.

Did I download the correct model?

Thanks Marco

Relevant Log Output

No response

URL or source code for simple inference testing code

import cv2
import numpy as np
from PIL import Image as PIL_Image
import tensorflow as tf

# Loading of the model
interpreter = tf.lite.Interpreter(model_path="model/hand_landmark.tflite")
interpreter.allocate_tensors()

# model details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
#print(output_details)
height, width = input_details[0]['shape'][1:3]
print("Model Image Size: " + str(height) + "x" + str(width)

# Loading test picture
image = cv2.imread("hand.jpg")
display(PIL_Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)))
image_height, image_width, _ = image.shape
print("Image Size: " + str(image_height) + "x" + str(image_width))

# Scaling picture to match input of the model
image_resized = cv2.resize(image, (width, height))
input_data = np.expand_dims(image_resized.astype(np.float32), axis=0)

# Applying model
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
landmarks = interpreter.get_tensor(output_details[0]['index'])
hand_right = interpreter.get_tensor(output_details[1]['index'])
hand_present = interpreter.get_tensor(output_details[2]['index'])
print("Hand present: " + str(hand_present) + "\n")
print("Left Right: " + str(hand_right) + "\n")
landmarks = np.reshape(landmarks, (-1, 3))

# Output (
r=3
for point in landmarks[0:21]:
    x = int(point[0] * 1)
    y = int(point[1] * 1)
    print("point: " + str(x) + "," + str(y))
    debug_image = cv2.circle(image_resized, (x, y), radius=r, color=(0, 0, 255), thickness=-1)

PINTO0309 / PINTO_model_zoo