Open isra60 opened 3 years ago
Hi, I am relying on the models by PINTO, who is an expert in the task of converting models from one framework to an other. We have already tried to make Movenet Single pose V4 work on depthai a few weeks ago. The conversion method which is a mix of using the tools developed by PINTO and manual modifications is decsribed there : https://github.com/PINTO0309/PINTO_model_zoo/blob/main/115_MoveNet/convert_script.txt Unfortunately, the converted models (IR files or blob file) do not work on MYRIAD. That's why the current repo is still using V3 models. About Movenet Multipose, it is WIP. But we have similar problems, PINTO is currently debugging the conversion. For now, only the Openvino FP32 IR model is working on CPU: https://github.com/geaxgx/openvino_movenet_multipose
The repositories will be updated if progress is made.
Hi, Thank you for your response.
Is there any issue on github about the process of updating the models, or it is just an internal conversation
I suppose this example could be made in C++ right? I'm trying to move to a more production environment, and we have there c++.
For Movenet V3 (this repo), there is this github issue: https://github.com/PINTO0309/PINTO_model_zoo/issues/100 For Movenet V4 and Multipose, it is just internal conversation. I don't know why but pose models are almost never easily converted, often need manual intervention and special tricks :-)
Sure, you can use C++. Except the small python part for the scripting node running on the device (in Edge mode).
Hi, have some of the blockers related to multi-pose been resolved? It looked like a few issues from the model zoo were resolved.
I have been very helped by geaxgx's sample code implementation, so I tuned the model to show my appreciation.
Wow, that's amazing ! Thanks a lot @PINTO0309 ! Actually, I saw yesterday that you were working on Movenet but had no time to understand what you had precisely done. And here you come ! You even managed to enhance the model with the detection of more people. Have you tried to run it on the MyriadX or on the Intel GPU via Openvino (the original converted model was working only on the CPU) ? I will find time this week-end to try your new models. Thanks again !
PS: out of curiosity, your PINTO_Special directory is in 115_MoveNet instead of 137_MoveNet_Multipose. Is there a reason for that ?
Have you tried to run it on the MyriadX or on the Intel GPU via Openvino (the original converted model was working only on the CPU) ?
I see, I hadn't checked with OpenVINO + Intel GPU yet. I will check it later.
PS: out of curiosity, your PINTO_Special directory is in 115_MoveNet instead of 137_MoveNet_Multipose. Is there a reason for that ?
After confirming that it worked, I was too happy and mistook the folder I should have committed to. :sweat_smile: I immediately realized my mistake and have committed shell scripts to both folders to download the same resource.
Hmmm. I have tested it by specifying GPU as the device, but it seems oddly slow. In my environment, I doubt that the GPU is really being used.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import copy
import time
import argparse
import cv2 as cv
import numpy as np
import logging as log
from openvino.inference_engine import IECore
def pad_image(
image: np.ndarray,
resize_width: int,
resize_height: int,
) -> np.ndarray:
"""Padding the perimeter of the image to the specified bounding rectangle size.
Parameters
----------
image: np.ndarray
Image to be resize and pad.
resize_width: int
Width of outer rectangle.
resize_width: int
Height of outer rectangle
Returns
-------
padded_image: np.ndarray
Image after padding.
"""
image_height = image.shape[0]
image_width = image.shape[1]
if resize_width < image_width:
resize_width = image_width
if resize_height < image_height:
resize_height = image_height
padded_image = np.zeros(
(resize_height, resize_width, 3),
np.uint8
)
start_h = int(resize_height / 2 - image_height / 2)
end_h = int(resize_height / 2 + image_height / 2)
start_w = int(resize_width / 2 - image_width / 2)
end_w = int(resize_width / 2 + image_width / 2)
padded_image[start_h:end_h, start_w:end_w, :] = image
return padded_image
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--device", type=int, default=0)
parser.add_argument("--file", type=str, default=None)
parser.add_argument("--width", help='cap width', type=int, default=640)
parser.add_argument("--height", help='cap height', type=int, default=480)
parser.add_argument('--mirror', action='store_true')
parser.add_argument("--keypoint_score", type=float, default=0.20)
parser.add_argument("--bbox_score", type=float, default=0.20)
parser.add_argument("--palm_square_crop", action='store_true')
parser.add_argument("--cpu_gpu", type=str, default='GPU')
args = parser.parse_args()
return args
def run_inference(
exec_net,
input_height,
input_width,
input_name0,
input_name1,
input_name2,
image,
):
image_width, image_height = np.asarray(image.shape[1], dtype=np.int64), np.asarray(image.shape[0], dtype=np.int64)
input_image = cv.resize(image, dsize=(input_width, input_height))
input_image = input_image[..., ::-1]
input_image = input_image.transpose(2,0,1)[np.newaxis, ...].astype(np.float32)
keypoints_with_scores = exec_net.infer(
{
input_name0: input_image,
input_name1: image_height,
input_name2: image_width,
}
)['batch_persons_kpxkpykpscore_x17_bx1by1bx2by2bscore']
keypoints_with_scores = np.squeeze(keypoints_with_scores)
return keypoints_with_scores
def main():
# 引数解析 #################################################################
args = get_args()
cap_device = args.device
if args.file is not None:
cap_device = args.file
cap = cv.VideoCapture(cap_device)
if args.file is not None:
cap_width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
cap_height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
else:
cap_width = args.width
cap_height = args.height
cap.set(cv.CAP_PROP_FRAME_WIDTH, cap_width)
cap.set(cv.CAP_PROP_FRAME_HEIGHT, cap_height)
mirror = args.mirror
keypoint_score_th = args.keypoint_score
bbox_score_th = args.bbox_score
palm_square_crop = args.palm_square_crop
cpu_gpu = args.cpu_gpu
# カメラ準備 ###############################################################
cap_fps = cap.get(cv.CAP_PROP_FPS)
fourcc = cv.VideoWriter_fourcc('m','p','4','v')
video_writer = cv.VideoWriter(
filename='output.mp4',
fourcc=fourcc,
fps=cap_fps,
frameSize=(cap_width, cap_height),
)
# モデルロード #############################################################
model_path = 'openvino/FP16/movenet_multipose_lightning_384x640_p10_myriad.xml'
ie = IECore()
net = ie.read_network(
model=model_path
)
# Get model info
input_blob = [key for key in net.input_info.keys()]
input_shape = net.input_info[input_blob[0]].input_data.shape
channes = input_shape[1]
input_height = input_shape[2]
input_width = input_shape[3]
exec_net = ie.load_network(network=net, device_name=cpu_gpu)
while True:
# カメラキャプチャ #####################################################
ret, frame = cap.read()
if not ret:
break
if mirror:
frame = cv.flip(frame, 1) # ミラー表示
debug_image = copy.deepcopy(frame)
# 検出実施 ##############################################################
start_time = time.time()
keypoints_with_scores = run_inference(
exec_net,
input_height,
input_width,
input_blob[0],
input_blob[1],
input_blob[2],
frame,
)
elapsed_time = time.time() - start_time
# デバッグ描画
debug_image = draw_debug(
debug_image,
elapsed_time,
keypoint_score_th,
bbox_score_th,
keypoints_with_scores,
mirror,
palm_square_crop,
)
# キー処理(ESC:終了) ##################################################
key = cv.waitKey(1)
if key == 27: # ESC
break
# 画面反映 #############################################################
cv.imshow('MoveNet(multipose) Demo', debug_image)
video_writer.write(debug_image)
if video_writer:
video_writer.release()
if cap:
cap.release()
cv.destroyAllWindows()
lines = [
[0,1],
[0,2],
[1,3],
[2,4],
[0,5],
[0,6],
[5,6],
[5,7],
[7,9],
[6,8],
[8,10],
[11,12],
[5,11],
[11,13],
[13,15],
[6,12],
[12,14],
[14,16],
]
def draw_debug(
image,
elapsed_time,
keypoint_score_th,
bbox_score_th,
keypoints_with_scores,
mirror,
palm_square_crop,
):
debug_image = copy.deepcopy(image)
"""
0:鼻 1:左目 2:右目 3:左耳 4:右耳 5:左肩 6:右肩 7:左肘 8:右肘 # 9:左手首
10:右手首 11:左股関節 12:右股関節 13:左ひざ 14:右ひざ 15:左足首 16:右足首
[persons, kpxkpykpscore_x17_bx1by1bx2by2bscore] [10,56]
0:keypoint0_x
1:keypoint0_y
2:keypoint0_score
:
46:keypoint16_x
47:keypoint16_y
50:keypoint16_score
51:bbox_x1
52:bbox_y1
53:bbox_x2
54:bbox_y2
55:bbox_score
"""
for idx, keypoints_with_score in enumerate(keypoints_with_scores):
if keypoints_with_score[55] > bbox_score_th:
# Line: bone
_ = [
cv.line(
debug_image,
(int(keypoints_with_score[line_idxs[0]*3+0]), int(keypoints_with_score[line_idxs[0]*3+1])),
(int(keypoints_with_score[line_idxs[1]*3+0]), int(keypoints_with_score[line_idxs[1]*3+1])),
(0, 255, 0),
2
) for line_idxs in lines \
if keypoints_with_score[line_idxs[0]*3+2] > keypoint_score_th and keypoints_with_score[line_idxs[1]*3+2] > keypoint_score_th
]
# Circle:各点
_ = [
cv.circle(
debug_image,
(int(keypoints_with_score[keypoint_idx*3+0]), int(keypoints_with_score[keypoint_idx*3+1])),
3,
(0, 0, 255),
-1
) for keypoint_idx in range(17) if keypoints_with_score[keypoint_idx*3+2] > keypoint_score_th
]
bbox_x1 = int(keypoints_with_score[51])
bbox_y1 = int(keypoints_with_score[52])
bbox_x2 = int(keypoints_with_score[53])
bbox_y2 = int(keypoints_with_score[54])
bbox_h = bbox_y2 - bbox_y1
if palm_square_crop:
# 手のひらのクロップ
image_width = image.shape[1]
image_height = image.shape[0]
x1 = int(keypoints_with_score[51])
y1 = int(keypoints_with_score[52])
x2 = int(keypoints_with_score[53])
y2 = int(keypoints_with_score[54])
x2 = x2 + 1 if x1 == x2 else x2 # 万が一幅がゼロになったときのAbort回避
y2 = y2 + 1 if y1 == y2 else y2 # 万が一高さがゼロになったときのAbort回避
"""
0:nose,
1:left eye,
2:right eye,
3:left ear,
4:right ear,
5:left shoulder,
6:right shoulder,
7:left elbow,
8:right elbow,
9:left wrist,
10:right wrist,
11:left hip,
12:right hip,
13:left knee,
14:right knee,
15:left ankle,
16:right ankle
"""
# 入力画像が判定しているときは左手系と右手系を入れ替える
if not mirror:
elbow_left_x = int(keypoints_with_score[21]) # 左肘のX座標
elbow_left_y = int(keypoints_with_score[22]) # 左肘のY座標
elbow_right_x = int(keypoints_with_score[24]) # 右肘のX座標
elbow_right_y = int(keypoints_with_score[25]) # 右肘のY座標
wrist_left_x = int(keypoints_with_score[27]) # 左手首のX座標
wrist_left_y = int(keypoints_with_score[28]) # 左手首のY座標
wrist_right_x = int(keypoints_with_score[30]) # 右手首のX座標
wrist_right_y = int(keypoints_with_score[31]) # 右手首のY座標
else:
elbow_left_x = int(keypoints_with_score[24]) # 左肘のX座標
elbow_left_y = int(keypoints_with_score[25]) # 左肘のY座標
elbow_right_x = int(keypoints_with_score[21]) # 右肘のX座標
elbow_right_y = int(keypoints_with_score[22]) # 右肘のY座標
wrist_left_x = int(keypoints_with_score[30]) # 左手首のX座標
wrist_left_y = int(keypoints_with_score[31]) # 左手首のY座標
wrist_right_x = int(keypoints_with_score[27]) # 右手首のX座標
wrist_right_y = int(keypoints_with_score[28]) # 右手首のY座標
"""
・左肘と左手首のX座標の位置関係を見て横方向のクロップ位置を微妙に補正する
左肘X座標 > 左手首Y座標: クロップ領域を画角左方向に少しずらし補正
左肘X座標 = 左手首X座標: ずらし補正なし
左肘X座標 < 左手首X座標: クロップ領域を画角右方向に少しずらし補正
・左肘と左手首のY座標の位置関係を見て縦方向のクロップ位置を微妙に補正する
左肘Y座標 > 左手首Y座標: クロップ領域を画角上方向に少しずらし補正
左肘Y座標 = 左手首Y座標: ずらし補正なし
左肘Y座標 < 左手首Y座標: クロップ領域を画角上方向に少しずらし補正
"""
distx_left_elbow_to_left_wrist = elbow_left_x - wrist_left_x # +:肘>手首, -:肘<手首
disty_left_elbow_to_left_wrist = elbow_left_y - wrist_left_y # +:肘が下で手首が上, -:肘が上で手首が下
distx_right_elbow_to_right_wrist = elbow_right_x - wrist_right_x # +:肘>手首, -:肘<手首
disty_right_elbow_to_right_wrist = elbow_right_y - wrist_right_y # +:肘が下で手首が上, -:肘が上で手首が下
adjust_ratio = 2
############################################################## 左手
# 左肘と左手首のX座標位置関係
left_wrist_x_adjust_pixel = 0
inversion = -1 if mirror else 1
if distx_left_elbow_to_left_wrist > 0:
left_wrist_x_adjust_pixel = (distx_left_elbow_to_left_wrist // adjust_ratio) * inversion
elif distx_left_elbow_to_left_wrist == 0:
left_wrist_x_adjust_pixel = 0
elif distx_left_elbow_to_left_wrist < 0:
left_wrist_x_adjust_pixel = (distx_left_elbow_to_left_wrist // adjust_ratio) * inversion
# 左肘と左手首のY座標位置関係
left_wrist_y_adjust_pixel = 0
if disty_left_elbow_to_left_wrist > 0:
left_wrist_y_adjust_pixel = (disty_left_elbow_to_left_wrist // adjust_ratio) * -1
elif disty_left_elbow_to_left_wrist == 0:
left_wrist_y_adjust_pixel = 0
elif disty_left_elbow_to_left_wrist < 0:
left_wrist_y_adjust_pixel = (disty_left_elbow_to_left_wrist // adjust_ratio) * -1
# クロップ中心位置補正
wrist_left_x = wrist_left_x + left_wrist_x_adjust_pixel
wrist_left_y = wrist_left_y + left_wrist_y_adjust_pixel
# 正方形のクロップ領域を crop_magnification倍 に拡張する
crop_magnification = 1.0
wrist_left_x1 = wrist_left_x - (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分左にずらした点
wrist_left_y1 = wrist_left_y - (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分上にずらした点
wrist_left_x2 = wrist_left_x + (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分右にずらした点
wrist_left_y2 = wrist_left_y + (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分下にずらした点
# 画角の範囲外参照回避
wrist_left_x1 = int(min(max(0, wrist_left_x1), image_width))
wrist_left_y1 = int(min(max(0, wrist_left_y1), image_height))
wrist_left_x2 = int(min(max(0, wrist_left_x2), image_width))
wrist_left_y2 = int(min(max(0, wrist_left_y2), image_height))
# 四方をパディングして正方形にした画像の取得
square_crop_size = max(wrist_left_x2 - wrist_left_x1, wrist_left_y2 - wrist_left_y1)
left_padded_image = pad_image(
image=image[wrist_left_y1:wrist_left_y2, wrist_left_x1:wrist_left_x2, :],
resize_width=square_crop_size,
resize_height=square_crop_size,
)
if left_padded_image.shape[0] > 0 and left_padded_image.shape[1] > 0:
cv.imshow(f'left_bbox{idx}', left_padded_image)
############################################################## 右手
# 左肘と左手首のX座標位置関係
right_wrist_x_adjust_pixel = 0
inversion = -1 if mirror else 1
if distx_right_elbow_to_right_wrist > 0:
right_wrist_x_adjust_pixel = (distx_right_elbow_to_right_wrist // adjust_ratio) * inversion
elif distx_right_elbow_to_right_wrist == 0:
right_wrist_x_adjust_pixel = 0
elif distx_right_elbow_to_right_wrist < 0:
right_wrist_x_adjust_pixel = (distx_right_elbow_to_right_wrist // adjust_ratio) * inversion
# 左肘と左手首のY座標位置関係
right_wrist_y_adjust_pixel = 0
if disty_right_elbow_to_right_wrist > 0:
right_wrist_y_adjust_pixel = (disty_right_elbow_to_right_wrist // adjust_ratio) * -1
elif disty_right_elbow_to_right_wrist == 0:
right_wrist_y_adjust_pixel = 0
elif disty_right_elbow_to_right_wrist < 0:
right_wrist_y_adjust_pixel = (disty_right_elbow_to_right_wrist // adjust_ratio) * -1
# クロップ中心位置補正
wrist_right_x = wrist_right_x + right_wrist_x_adjust_pixel
wrist_right_y = wrist_right_y + right_wrist_y_adjust_pixel
# 正方形のクロップ領域を crop_magnification倍 に拡張する
crop_magnification = 1.0
wrist_right_x1 = wrist_right_x - (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分左にずらした点
wrist_right_y1 = wrist_right_y - (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分上にずらした点
wrist_right_x2 = wrist_right_x + (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分右にずらした点
wrist_right_y2 = wrist_right_y + (bbox_h / 4 * crop_magnification) # 左手手首の中心座標から肩幅の半分下にずらした点
# 画角の範囲外参照回避
wrist_right_x1 = int(min(max(0, wrist_right_x1), image_width))
wrist_right_y1 = int(min(max(0, wrist_right_y1), image_height))
wrist_right_x2 = int(min(max(0, wrist_right_x2), image_width))
wrist_right_y2 = int(min(max(0, wrist_right_y2), image_height))
# 四方をパディングして正方形にした画像の取得
square_crop_size = max(wrist_right_x2 - wrist_right_x1, wrist_right_y2 - wrist_right_y1)
right_padded_image = pad_image(
image=image[wrist_right_y1:wrist_right_y2, wrist_right_x1:wrist_right_x2, :],
resize_width=square_crop_size,
resize_height=square_crop_size,
)
if right_padded_image.shape[0] > 0 and right_padded_image.shape[1] > 0:
cv.imshow(f'right_bbox{idx}', right_padded_image)
# バウンディングボックス
cv.rectangle(
debug_image,
(bbox_x1, bbox_y1),
(bbox_x2, bbox_y2),
(255, 255, 255),
4,
)
cv.rectangle(
debug_image,
(bbox_x1, bbox_y1),
(bbox_x2, bbox_y2),
(0, 0, 0),
2,
)
# 処理時間
txt = f"Elapsed Time : {elapsed_time * 1000:.1f} ms (inference + post-process)"
cv.putText(
debug_image,
txt,
(10, 30),
cv.FONT_HERSHEY_SIMPLEX,
0.7,
(255, 255, 255),
4,
cv.LINE_AA,
)
cv.putText(
debug_image,
txt,
(10, 30),
cv.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 0, 0),
2,
cv.LINE_AA,
)
return debug_image
if __name__ == '__main__':
main()
I reinstalled OpenCL and it worked on GPU. :+1:
Thanks @PINTO0309. I have just tried your script on the GPU. It works too. But much slower than yours (~150ms). I have just installed Openvino 2022.1 and run install_NEO_OCL_driver.sh so I guess OpenCL is OK.
I also tried on the MYRIAD (./script.sh --cpu_gpu MYRIAD). Unfortunately it does not work: no error message but no landmarks drawn. I will do more tests tomorrow.
Perhaps the normalization of the model's entrance may have influenced it. I will try some patterns too.
# -1 to 1 norm
x = x / 127.5 - 1.0
When MYRIAD
was specified, nothing was detected. There appears to be a problem with the internal logic of NCS2 (MYRIAD).
Model 256x320 - 10 persons - Enable Normalization openvino256x320.zip
Model 256x320 - 10 persons - Disable Normalization openvino_256x320.zip
No detect
No detect
Thank you for your investigation @PINTO0309 . IMHO it is not worth investigating more. I am not sure Intel is putting much effort on the Myriad. I hope Keembay will work better :-)
I agree. Because of that past history, I have not conducted Myriad tests for several more years. :smile_cat:
Hi. Reading the readme seems that you only provide the V3 version of that models, but now there are v4 versions and a multi-pose model.
Could you provide support or a guide about how to convert the models from PINTO to work with the DepthAi Camera??
Thanks.!