PINTO0309 / PINTO_model_zoo

A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
https://qiita.com/PINTO
MIT License
3.5k stars 567 forks source link

282_face_landmark_with_attention Results Are Offset and Scaled Incorrectly #432

Open BharathBillawa opened 4 hours ago

BharathBillawa commented 4 hours ago

Issue Type

Support

OS

Ubuntu

OS architecture

x86_64

Programming Language

Python

Framework

ONNX

Model name and Weights/Checkpoints URL

282_face_landmark_with_attention , https://github.com/PINTO0309/PINTO_model_zoo/files/8357832/model_float32.onnx.zip

Referred from: https://github.com/PINTO0309/PINTO_model_zoo/issues/143

Description

Hello, Thank you for providing this great model collection. I’m using the face landmark detection model and encountered an issue that might be of interest to others. Despite following the documentation and trying different preprocessing methods, the facial landmarks appear offset and scaled incorrectly on the original image.

I get output is like: out_2

I’ve reviewed the preprocessing steps and tried multiple approaches but couldn’t resolve the issue. If this is something others might face or if I missed a step, I’d appreciate any clarification or insights.

Best regards, Bharath

Relevant Log Output

No response

URL or source code for simple inference testing code

Here’s the code I’m using:

import cv2
import numpy as np
import onnxruntime 

# !wget -q -O image.png https://storage.googleapis.com/mediapipe-assets/business-person.png

img = cv2.imread("image.png")
resized_img = cv2.resize(img, (192, 192))

facemesh_onnx_path = "face_landmark_with_attention.onnx"
ort_session = onnxruntime.InferenceSession(facemesh_onnx_path)

input_name = ort_session.get_inputs()[0].name
output_names = [o.name for o in ort_session.get_outputs()]

input_batch = np.expand_dims(resized_img / 255, axis=0).astype(np.float32)
input_batch = input_batch.transpose(0,3,1,2)

result = ort_session.run(output_names, {input_name: input_batch})
result_dict = {output_name: output for output_name, output in zip(output_names, result)}

face = np.squeeze(result_dict['output_mesh_identity']).reshape(-1, 3)
image_copy = resized_img.copy()

for landmark in face[:, :2]:
  cv2.circle(image_copy, (int(landmark[0]), int(landmark[1])), 2, (0, 255, 0), -1)

cv2.imshow(image_copy)
PINTO0309 commented 3 hours ago

Try out the source code and model that I refactored as a hobby. The problem you're having is simply that your object detection model is performing too poorly. This model requires the head area to be cut off before use. Please note that a certain margin is required around the image of the head to be cut out.

https://github.com/PINTO0309/facemesh_onnx_tensorrt

https://github.com/user-attachments/assets/b32a02e7-6538-4b35-9341-98598184b5bf