Closed geaxgx closed 3 years ago
You are the 100th issue contributor to be celebrated. :sweat_smile:
I was aware of the problem you pointed out while I was converting the model while eating lunch. In fact, I have also identified a way to solve that part of the problem. However, after this problem is solved, another major problem arises that cannot be helped.
TensorFlow's FloorDiv
operation cannot be handled correctly by OpenVINO.
This is a known issue that only I see as a problem.
I tried to reconvert lightning using a special trick.
I feel very honoured and proud to be the 100th issue contributor ! A lot of hard work to get there :-))
Thank you for the "special trick" models. I can load them successfully. Now I have to read a bit the documentation to know how to decode the outputs.
Hmm that's strange, I get negative numbers in the model ouput ("Identity"):
kps [[ 0.33809385 -0.3008902 0.86417985] [ 0.35239205 -0.28053764 0.7577548 ] [ 0.34184605 -0.28763872 0.68992126] [ 0.41102067 -0.2875655 0.8167058 ] [ 0.38617188 -0.29239017 0.7693529 ] [ 0.49646014 -0.34640452 0.8592416 ] [ 0.37429118 -0.39201987 0.8351665 ] [ 0.62626785 -0.34436777 0.5662274 ] [ 0.26715785 -0.29554045 0.6619134 ] [ 0.6390584 -0.3938337 0.5435367 ] [ 0.19667952 -0.21501386 0.6472453 ] [ 0.63254285 -0.49191946 0.91860545] [ 0.58395 0.41719258 0.93589425] [ 0.69131416 -0.29929492 0.94660014] [ 0.751432 0.30486852 0.939826 ] [ 0.8990828 -0.29772794 0.9320587 ] [ 0.8820982 0.2047543 0.9341732 ]]
From the model card:
Inputs
A frame of video or an image, represented as an int32 tensor of shape: 192x192x3. Channels order: RGB with values in [0, 255].
Outputs
A float32 tensor of shape [1, 1, 17, 3].
The first two channels of the last dimension represents the yx coordinates (normalized to image frame, i.e. range in [0.0, 1.0]) of the 17 keypoints (in the order of: [nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]).
The third channel of the last dimension represents the prediction confidence scores of each keypoint, also in the range [0.0, 1.0].
I need to check with you if your converted model is expecting as input float32 between [0,1] or between [0,255] ?
I do not specify any normalization values in the parameters during the conversion.
Hmmm. I did add a customization to change the INPUT type from INT32 to Float32. But other than that, I didn't modify the OP this time.
Ok thanks. So the model is expecting inputs between [0,255]. And the numbers I got in my previous post correspond to such input. Note that the first number of each triplet corresponds to the y coordinate and seems coherent with the input image.
For instance, the highest keypoint in the image is the right wrist and corresponds to the lowest y 0.19667952. But I don't really know how to interpret the 2nd channel with these negative numbers.
The sample program doesn't seem to be doing any special processing. The only thing that bothers me is the statement that the aspect ratio needs to be maintained. https://www.tensorflow.org/hub/tutorials/movenet
#@title Helper functions for visualization
# Dictionary that maps from joint names to keypoint indices.
KEYPOINT_DICT = {
'nose': 0,
'left_eye': 1,
'right_eye': 2,
'left_ear': 3,
'right_ear': 4,
'left_shoulder': 5,
'right_shoulder': 6,
'left_elbow': 7,
'right_elbow': 8,
'left_wrist': 9,
'right_wrist': 10,
'left_hip': 11,
'right_hip': 12,
'left_knee': 13,
'right_knee': 14,
'left_ankle': 15,
'right_ankle': 16
}
# Maps bones to a matplotlib color name.
KEYPOINT_EDGE_INDS_TO_COLOR = {
(0, 1): 'm',
(0, 2): 'c',
(1, 3): 'm',
(2, 4): 'c',
(0, 5): 'm',
(0, 6): 'c',
(5, 7): 'm',
(7, 9): 'm',
(6, 8): 'c',
(8, 10): 'c',
(5, 6): 'y',
(5, 11): 'm',
(6, 12): 'c',
(11, 12): 'y',
(11, 13): 'm',
(13, 15): 'm',
(12, 14): 'c',
(14, 16): 'c'
}
def _keypoints_and_edges_for_display(keypoints_with_scores,
height,
width,
keypoint_threshold=0.11):
"""Returns high confidence keypoints and edges for visualization.
Args:
keypoints_with_scores: A numpy array with shape [1, 1, 17, 3] representing
the keypoint coordinates and scores returned from the MoveNet model.
height: height of the image in pixels.
width: width of the image in pixels.
keypoint_threshold: minimum confidence score for a keypoint to be
visualized.
Returns:
A (keypoints_xy, edges_xy, edge_colors) containing:
* the coordinates of all keypoints of all detected entities;
* the coordinates of all skeleton edges of all detected entities;
* the colors in which the edges should be plotted.
"""
keypoints_all = []
keypoint_edges_all = []
edge_colors = []
num_instances, _, _, _ = keypoints_with_scores.shape
for idx in range(num_instances):
kpts_x = keypoints_with_scores[0, idx, :, 1]
kpts_y = keypoints_with_scores[0, idx, :, 0]
kpts_scores = keypoints_with_scores[0, idx, :, 2]
kpts_absolute_xy = np.stack(
[width * np.array(kpts_x), height * np.array(kpts_y)], axis=-1)
kpts_above_thresh_absolute = kpts_absolute_xy[
kpts_scores > keypoint_threshold, :]
keypoints_all.append(kpts_above_thresh_absolute)
for edge_pair, color in KEYPOINT_EDGE_INDS_TO_COLOR.items():
if (kpts_scores[edge_pair[0]] > keypoint_threshold and
kpts_scores[edge_pair[1]] > keypoint_threshold):
x_start = kpts_absolute_xy[edge_pair[0], 0]
y_start = kpts_absolute_xy[edge_pair[0], 1]
x_end = kpts_absolute_xy[edge_pair[1], 0]
y_end = kpts_absolute_xy[edge_pair[1], 1]
line_seg = np.array([[x_start, y_start], [x_end, y_end]])
keypoint_edges_all.append(line_seg)
edge_colors.append(color)
if keypoints_all:
keypoints_xy = np.concatenate(keypoints_all, axis=0)
else:
keypoints_xy = np.zeros((0, 17, 2))
if keypoint_edges_all:
edges_xy = np.stack(keypoint_edges_all, axis=0)
else:
edges_xy = np.zeros((0, 2, 2))
return keypoints_xy, edges_xy, edge_colors
def draw_prediction_on_image(
image, keypoints_with_scores, crop_region=None, close_figure=False,
output_image_height=None):
"""Draws the keypoint predictions on image.
Args:
image: A numpy array with shape [height, width, channel] representing the
pixel values of the input image.
keypoints_with_scores: A numpy array with shape [1, 1, 17, 3] representing
the keypoint coordinates and scores returned from the MoveNet model.
crop_region: A dictionary that defines the coordinates of the bounding box
of the crop region in normalized coordinates (see the init_crop_region
function below for more detail). If provided, this function will also
draw the bounding box on the image.
output_image_height: An integer indicating the height of the output image.
Note that the image aspect ratio will be the same as the input image.
Returns:
A numpy array with shape [out_height, out_width, channel] representing the
image overlaid with keypoint predictions.
"""
height, width, channel = image.shape
aspect_ratio = float(width) / height
fig, ax = plt.subplots(figsize=(12 * aspect_ratio, 12))
# To remove the huge white borders
fig.tight_layout(pad=0)
ax.margins(0)
ax.set_yticklabels([])
ax.set_xticklabels([])
plt.axis('off')
im = ax.imshow(image)
line_segments = LineCollection([], linewidths=(4), linestyle='solid')
ax.add_collection(line_segments)
# Turn off tick labels
scat = ax.scatter([], [], s=60, color='#FF1493', zorder=3)
(keypoint_locs, keypoint_edges,
edge_colors) = _keypoints_and_edges_for_display(
keypoints_with_scores, height, width)
line_segments.set_segments(keypoint_edges)
line_segments.set_color(edge_colors)
if keypoint_edges.shape[0]:
line_segments.set_segments(keypoint_edges)
line_segments.set_color(edge_colors)
if keypoint_locs.shape[0]:
scat.set_offsets(keypoint_locs)
if crop_region is not None:
xmin = max(crop_region['x_min'] * width, 0.0)
ymin = max(crop_region['y_min'] * height, 0.0)
rec_width = min(crop_region['x_max'], 0.99) * width - xmin
rec_height = min(crop_region['y_max'], 0.99) * height - ymin
rect = patches.Rectangle(
(xmin,ymin),rec_width,rec_height,
linewidth=1,edgecolor='b',facecolor='none')
ax.add_patch(rect)
fig.canvas.draw()
image_from_plot = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
image_from_plot = image_from_plot.reshape(
fig.canvas.get_width_height()[::-1] + (3,))
plt.close(fig)
if output_image_height is not None:
output_image_width = int(output_image_height / height * width)
image_from_plot = cv2.resize(
image_from_plot, dsize=(output_image_width, output_image_height),
interpolation=cv2.INTER_CUBIC)
return image_from_plot
def to_gif(images, fps):
"""Converts image sequence (4D numpy array) to gif."""
imageio.mimsave('./animation.gif', images, fps=fps)
return embed.embed_file('./animation.gif')
def progress(value, max=100):
return HTML("""
<progress
value='{value}'
max='{max}',
style='width: 100%'
>
{value}
</progress>
""".format(value=value, max=max))
# Resize and pad the image to keep the aspect ratio and fit the expected size.
input_image = tf.expand_dims(image, axis=0)
input_image = tf.cast(tf.image.resize_with_pad(
input_image, input_size, input_size), dtype=tf.int32)
# Run model inference.
outputs = movenet(input_image)
# Output is a [1, 1, 17, 3] tensor.
keypoint_with_scores = outputs['output_0']
# Visualize the predictions with image.
display_image = tf.expand_dims(image, axis=0)
display_image = tf.cast(tf.image.resize_with_pad(
display_image, 1280, 1280), dtype=tf.int32)
output_overlay = draw_prediction_on_image(
np.squeeze(display_image.numpy(), axis=0), keypoint_with_scores)
plt.figure(figsize=(5, 5))
plt.imshow(output_overlay)
_ = plt.axis('off')
I need to do some research to see if the problem came up when I converted to ONNX.
This is what I get on their colab demo with my sample image:
tf.Tensor(
[[[[0.32252783 0.6974002 0.87823004]
[0.3368386 0.7183701 0.73135835]
[0.32875112 0.7149483 0.53043044]
[0.39574257 0.7109653 0.83786 ]
[0.373867 0.7076759 0.7852564 ]
[0.4851149 0.65141076 0.8171427 ]
[0.3588924 0.60673237 0.8145416 ]
[0.60513955 0.65146786 0.500903 ]
[0.25296256 0.70648587 0.5550392 ]
[0.6227146 0.60486215 0.51756746]
[0.18121433 0.7854103 0.62142277]
[0.6179341 0.5054677 0.8370107 ]
[0.58052903 0.41632038 0.7957091 ]
[0.67267233 0.70280856 0.90661496]
[0.75072277 0.3047437 0.84694517]
[0.879561 0.7038015 0.9044996 ]
[0.8851525 0.2067561 0.9440259 ]]]], shape=(1, 1, 17, 3), dtype=float32)
So for y, we have similar values. For x, if I compare my_x with their_x, it looks like: if their_x < 0.5, then my_x = their_x if their_x > 0.5 then my_x = their_x - 1
This is the result of reasoning with a tflite model that I just changed the input type to Float32 before converting to ONNX.
array([[[[0.32448542, 0.68699664, 0.64204913],
[0.33508152, 0.7094843 , 0.55379134],
[0.32651505, 0.7030307 , 0.437988 ],
[0.3908338 , 0.7020813 , 0.6667765 ],
[0.36634612, 0.703608 , 0.5485987 ],
[0.4885859 , 0.6513269 , 0.5774252 ],
[0.36186764, 0.6106034 , 0.53571534],
[0.62881696, 0.6714854 , 0.29908276],
[0.23611161, 0.7139294 , 0.47521845],
[0.62566906, 0.60845155, 0.45314837],
[0.17534202, 0.78672737, 0.822638 ],
[0.603142 , 0.5127338 , 0.67058015],
[0.5701561 , 0.41531724, 0.82032025],
[0.6790442 , 0.6913149 , 0.9012103 ],
[0.7507467 , 0.30132815, 0.8377824 ],
[0.87941265, 0.70109427, 0.84762025],
[0.8852422 , 0.19999856, 0.74576056]]]], dtype=float32)
Looks great !
So it looks like something is going on here via ONNX or OpenVINO optimizer.
Hi ! Maybe better to open new issue but I'm trying to get running MoveNet on MyriadX (OAK device) but without any success
[14442C10A119CBD200] [232.129] [NeuralNetwork(2)] [warning] Input image (192x192) does not match NN (3x192) - skipping inference [14442C10A119CBD200] [232.129] [NeuralNetwork(2)] [error] Input tensor 'input:0' (0) exceeds available data range. Data size (110592B), tensor offset (0), size (221184B) - skipping inference
Maybe is expecting 3x192x192 instead of 192x192x3 ? I'm just following the info on the model card.
I know @geaxgx has experience with OAK Devices ... Could you point me in the right direction? Any advice?
I tried to convert myself the tflite model but after generating IR model I'm getting "Argument element types are inconsistent" for 'Concat_1876' (?)
Hi @gespona, @PINTO0309 has done the conversion of the models but there is still a small problem with the output values. PINTO is working on it. Just be patient.
Inference results for the ONNX file just before converting to OpenVINO. Almost OK.
[array([[[[0.32448548, 0.6869966 , 0.6420484 ],
[0.33508155, 0.7094843 , 0.5537915 ],
[0.32651508, 0.7030306 , 0.4379869 ],
[0.39083382, 0.7020813 , 0.6667766 ],
[0.36634612, 0.703608 , 0.5485983 ],
[0.4885859 , 0.65132695, 0.577426 ],
[0.3618676 , 0.6106034 , 0.53571665],
[0.6288169 , 0.67148536, 0.29908305],
[0.23611161, 0.7139294 , 0.47521886],
[0.6256691 , 0.6084515 , 0.45314914],
[0.17534202, 0.78672737, 0.82263756],
[0.603142 , 0.5127338 , 0.670581 ],
[0.57015616, 0.41531718, 0.8203208 ],
[0.6790442 , 0.6913149 , 0.90121067],
[0.7507467 , 0.30132815, 0.83778256],
[0.87941265, 0.70109427, 0.84762037],
[0.8852422 , 0.19999854, 0.7457607 ]]]], dtype=float32)]
OpenVINO FP32
array([[[[0.32448542, 0.6869966 , 0.6420481 ],
[0.33508152, 0.7094842 , 0.55379057],
[0.32651508, 0.7030306 , 0.43798667],
[0.39083382, 0.7020813 , 0.6667762 ],
[0.36634612, 0.703608 , 0.5485983 ],
[0.48858595, 0.651327 , 0.5774265 ],
[0.3618676 , 0.6106034 , 0.53571504],
[0.62881696, 0.6714853 , 0.29908296],
[0.23611164, 0.7139294 , 0.47521782],
[0.6256691 , 0.6084515 , 0.45314965],
[0.17534205, 0.7867273 , 0.82263744],
[0.60314196, 0.5127338 , 0.6705806 ],
[0.57015604, 0.41531724, 0.8203208 ],
[0.6790442 , 0.6913149 , 0.90121037],
[0.7507467 , 0.30132818, 0.8377828 ],
[0.87941265, 0.70109427, 0.8476199 ],
[0.88524216, 0.19999859, 0.7457614 ]]]], dtype=float32)
OpenVINO FP16
array([[[[0.3242579 , 0.6867883 , 0.63874394],
[0.334914 , 0.709219 , 0.55096024],
[0.326399 , 0.7028601 , 0.4379415 ],
[0.3907204 , 0.7018955 , 0.66515774],
[0.3662796 , 0.70349765, 0.54623103],
[0.48843265, 0.65106046, 0.577307 ],
[0.3616504 , 0.61051846, 0.5368742 ],
[0.62867224, 0.6713548 , 0.30035877],
[0.23601574, 0.7138525 , 0.47763693],
[0.6255478 , 0.6083473 , 0.4535727 ],
[0.17515807, 0.7865279 , 0.8231634 ],
[0.6029311 , 0.512741 , 0.67055583],
[0.57002366, 0.41531637, 0.8211969 ],
[0.6787125 , 0.69121534, 0.90212846],
[0.75059325, 0.3013085 , 0.83856463],
[0.8791874 , 0.70098466, 0.8479991 ],
[0.88505656, 0.19993334, 0.74572027]]]], dtype=float32)
I'm getting tired... :crying_cat_face:
What time is it in Japan ? 0:30 ? Time to go to sleep Katsuya ! You fully deserve it ! We are not in hurry.
Anyway, myself, I won't have time to test this evening :-)
Done.
Thanks for worrying about me! My shoulders are just stiff from working from home and being on the computer all day. It's 00:30 in Japan. But I'm not sleepy yet, so I'll commit the modified ONNX, OpenVINO IR and Blob, then smoke a cigarette and go to bed.
Sorry if I sounded impatient .... Ofc we're not in hurry. Thanks a lot for all this amazing work :)
Ha ha enjoy the cigarette ! Thanks again. I don't know how we would do if you did not exist :-)
ONNX and OpenVINO IR and Myriad Inference Engine Blobs have been updated. I leave the rest of the verification to @geaxgx and @gespona.
When converting from tflite to ONNX, the values seem to shift slightly. I'm going to bed. Good night.
Thanks a lot ! Going to try again with Myriad
Thanks Katsuya, I will try tomorrow.
That's strange. I have just done a quick test after using 115_MoveNet/download_thunder.sh to download the models. I still get the negative numbers in the output. Am I using the very last models ?
On my side similar .. I'm getting same errors after downloading again .. even I'm not sure if it's related to the same issue
@gespona I think your problem is another issue. I remember that the message [NeuralNetwork(2)] [warning] Input image (192x192) does not match NN (3x192) - skipping inference
was a bug in an older version which has been fixed. But at that time, it was an error, whereas here it is a warning. I don't have time to investigate tonight but will do tomorrow.
On my side, I want first to make it work on Openvino. My problem of negative values is not really blocking because the negative values can be fixed by added 1 to them.
@geaxgx Yes I agree. In fact the warning is happening when feeding directly the NN node with camera. I tried feed with XLinkIn node and warning is not appearing (?) :)
But error is always there: [14442C10A119CBD200] [232.129] [NeuralNetwork(2)] [error] Input tensor 'input:0' (0) exceeds available data range. Data size (110592B), tensor offset (0), size (221184B) - skipping inference
But I'm totally lost ... 110592 matches with 192 192 3 but no clue about the other size .... so I'd really appreciate your feedback when you have time to check. Thanks!
221184 = 2 * 110592 Maybe the input data you are sending is not of the correct type ?
That's a good point. Checking again ... the model card says input is int32
No luck for now ... but I found similar issue (input tensor) reported here https://github.com/luxonis/depthai/issues/366 ... but not clear yet ... maybe it's issue more related to oak device then (?)
@gespona
Earlier in this issue, I mentioned that I changed the type of the input from INT32 to FLOAT32 to quantize the model. Use Netron, a web site that allows you to visualize the structure of your model, to see the structure expected of the input. I am not very familiar with what the error message means, but the caveat is that I am converting the Float16 (FP16)
model to Blob. I made a simple prediction because the offset value in the error message is double the value. A float16 precision model tagged with model_float32
. I have customized the model to be optimized for Float32 and Float16, INT8, so please ignore the description on the model card. INT32 is an unwieldy and unfriendly model for many users of the model.
https://netron.app/
openvino/FP16/movenet_singlepose_lightning_3.xml
openvino/FP16/movenet_singlepose_thunder_3.xml
The conversion command I used is below. I specified FP16 for the conversion, but will the result be the same if I reconvert using FP32? If there is a problem, it is a problem beyond my control.
${INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/myriad_compile \
-m openvino/FP16/movenet_singlepose_lightning_3.xml \
-VPU_NUMBER_OF_SHAVES 4 \
-VPU_NUMBER_OF_CMX_SLICES 4 \
-o openvino/myriad/movenet_singlepose_lightning_3.blob
@geaxgx I downloaded the latest Google Drive file again this morning, which I thought I had uploaded last night, and tested it again in a separate working folder to make sure there were no mistakes.
#### lightning
$ sudo gdown --id 1Fkh3N5fhyvrkWBF-9X7FhsN6YveFFu_O
$ tar -zxvf resources.tar.gz
$ mv resources resources1
#### thunder
$ sudo gdown --id 1RDF35KcL7kWRb4dgRf0OudH6l0EtZ3qw
$ tar -zxvf resources.tar.gz
$ mv resources resources2
test.png
test_onnx.py
import onnx
import onnxruntime
import numpy as np
import cv2
import pprint
model_path = "model_float32.onnx"
def main(): model = onnx.load(model_path) onnx.checker.check_model(model)
sess = onnxruntime.InferenceSession(model_path)
image = cv2.imread('test.png')
frame = cv2.resize(image, (256, 256))
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = np.expand_dims(frame, axis=0)
frame = frame.astype('float32')
inputs = {sess.get_inputs()[0].name: frame}
outputs = sess.run(None, inputs)
pprint.pprint(outputs)
if name == "main": main()
[array([[[[0.32433048, 0.69931823, 0.8797674 ], [0.3375441 , 0.71945375, 0.7601149 ], [0.32876045, 0.7161203 , 0.5944811 ], [0.39609817, 0.7126299 , 0.82234097], [0.37236294, 0.70911103, 0.8054341 ], [0.48610994, 0.6517349 , 0.85130155], [0.35770214, 0.60963666, 0.8522695 ], [0.60724556, 0.6535827 , 0.4368128 ], [0.2557739 , 0.7068589 , 0.61313367], [0.6248696 , 0.6063015 , 0.55172634], [0.18469065, 0.78301877, 0.52691185], [0.61794114, 0.5086788 , 0.9210715 ], [0.58055407, 0.41669005, 0.8049598 ], [0.67506 , 0.7038542 , 0.91628826], [0.7518556 , 0.30507305, 0.834178 ], [0.88219225, 0.7037958 , 0.90368426], [0.88579935, 0.20647278, 0.9427309 ]]]], dtype=float32)]
- test_openvino.py
```python
from openvino.inference_engine import IECore
import numpy as np
import cv2
import pprint
XML_PATH = "openvino/FP16/movenet_singlepose_thunder_3.xml"
BIN_PATH = "openvino/FP16/movenet_singlepose_thunder_3.bin"
ie = IECore()
net = ie.read_network(model=XML_PATH, weights=BIN_PATH)
input_blob = next(iter(net.input_info))
exec_net = ie.load_network(net, device_name='CPU', num_requests=1)
inference_request = exec_net.requests[0]
img = cv2.imread('test.png')
img = cv2.resize(img, (256, 256))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.asarray(img)
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]
exec_net.infer(inputs={input_blob: img})
pprint.pprint(inference_request.output_blobs)
output = inference_request.output_blobs['Identity'].buffer
pprint.pprint(output)
{'7022.0': <openvino.inference_engine.ie_api.Blob object at 0x7f4f6bebce80>,
'7026.0': <openvino.inference_engine.ie_api.Blob object at 0x7f4f737caa80>,
'Identity': <openvino.inference_engine.ie_api.Blob object at 0x7f4f6bebce40>}
array([[[[0.32441464, 0.69939005, 0.879445 ],
[0.33760712, 0.71948093, 0.7585466 ],
[0.32884556, 0.7161406 , 0.59659934],
[0.3961212 , 0.7126963 , 0.8225935 ],
[0.37242952, 0.7091924 , 0.80385953],
[0.48617375, 0.65181863, 0.8533465 ],
[0.35786432, 0.6097264 , 0.85393965],
[0.6126379 , 0.65634876, 0.38123205],
[0.25588638, 0.7068643 , 0.6161008 ],
[0.62492216, 0.6062361 , 0.54878634],
[0.18464863, 0.7830283 , 0.52940834],
[0.6179626 , 0.50863874, 0.9211578 ],
[0.5806241 , 0.41676882, 0.80573714],
[0.67509186, 0.7039062 , 0.91644865],
[0.75190806, 0.30513576, 0.8353069 ],
[0.8821908 , 0.7038179 , 0.90322673],
[0.88581234, 0.20652623, 0.9427294 ]]]], dtype=float32)
import numpy as np
import time
import tensorflow.lite as tflite
import cv2
import sys
interpreter = tflite.Interpreter(model_path='model_float32.tflite', num_threads=4) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()
image = cv2.cvtColor(cv2.imread('test.png'), cv2.COLOR_BGR2RGB) image = cv2.resize(image, (256, 256)) image = np.expand_dims(image, axis=0) image = image.astype('float32')
interpreter.set_tensor(input_details[0]['index'], image) start_time = time.time() interpreter.invoke() stop_time = time.time() print("time: ", stop_time - start_time)
scores = interpreter.get_tensor(output_details[0]['index'])
import pprint pprint.pprint(scores)
time: 0.03680682182312012 array([[[[0.32433048, 0.69931823, 0.8797673 ], [0.3375441 , 0.71945375, 0.7601152 ], [0.32876042, 0.7161203 , 0.59448147], [0.3960982 , 0.7126299 , 0.8223407 ], [0.3723629 , 0.7091111 , 0.805434 ], [0.48610994, 0.6517349 , 0.8513019 ], [0.3577021 , 0.60963666, 0.8522699 ], [0.6072456 , 0.6535827 , 0.43681395], [0.25577393, 0.7068589 , 0.6131334 ], [0.62486964, 0.6063015 , 0.55172515], [0.18469064, 0.78301877, 0.5269122 ], [0.61794114, 0.50867885, 0.92107165], [0.58055407, 0.41669008, 0.8049605 ], [0.67506003, 0.7038542 , 0.91628814], [0.7518556 , 0.30507302, 0.8341775 ], [0.88219225, 0.70379585, 0.9036844 ], [0.8857994 , 0.20647278, 0.9427308 ]]]], dtype=float32)
Thanks for the updated links and sorry for the trouble. I just understood where the negative values are coming from. My bad. It is because the Openvino runtime I used for the tests were the 2021.2 instead of 2021.3 :-(
Thunder with openvino on CPU :
Thunder with openvino on Myriad: The skeleton is shifted.
Ligthning with Openvino on CPU:
Lightning with Openvino on Myriad: Same problem of shifting. Do we get the same problem of precision on float16 we had with blazepose heavy ?
@geaxgx What happens when you use this? FP16 lightning / thunder .xml The bad result is different from that of BlazePose. This time it looks like it is simply shifted to the upper left.
movenet_singlepose_lightning_3.zip movenet_singlepose_thunder_3.zip
Ha ha much better !
That's what I expected.
And for thunder on myriad :
You are a genius Katsuya ! align_corners again ? It looks like there is just face landmarks that are shifted a bit.
About the problem of @gespona, what arguments are you using in the compile_tool command to generate the blob ? -ip U8 ? -ip FP 16 ?
align_corners again ?
Yes. That's right.
It looks like there is just face landmarks that are shifted a bit.
Originally, when I looked at the FP16 values, I found that there was a slight error with the FP32 values, perhaps due to the loss of important facial features when scaling down to the 192x192 or 256x256 sizes.
-ip U8 ? -ip FP 16 ?
I have not specified anything.
${INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/myriad_compile \
-m openvino/FP16/movenet_singlepose_lightning_3.xml \
-VPU_NUMBER_OF_SHAVES 4 \
-VPU_NUMBER_OF_CMX_SLICES 4 \
-o openvino/myriad/movenet_singlepose_lightning_3.blob
I personally think that face landmarks are not very important when doing body pose recognition. But movenet seems pretty good for the skeleton. I have to compare with blazepose on challenging poses.
If you don't specify '-ip' argument, I guess it will use FP16. In that case, @gespona should probably use cam.setFp16(True) when using the internal camera on depthai. I will check later on.
1. OS Ubuntu 18.04
2. OS Architecture x86_64
3. Version of OpenVINO 2021.3.394
9. Movenet from your model zoo
Ha ha it's me again ;-) I saw you have already converted Movenet ! Naturally I wanted to give it a try. I get this error message when loading the 'lightning' (or 'thunder') model: