Movenet: error on loading model with Openvino

geaxgx commented 3 years ago

1. OS Ubuntu 18.04

2. OS Architecture x86_64

3. Version of OpenVINO 2021.3.394

9. Movenet from your model zoo

Ha ha it's me again ;-) I saw you have already converted Movenet ! Naturally I wanted to give it a try. I get this error message when loading the 'lightning' (or 'thunder') model:

openvino@ubuntu:/workdir$ python3 MovenetOpenvino.py -m lightning
Video FPS: 30
Loading Inference Engine
Device info:
        CPU
        MKLDNNPlugin version ......... 2.1
        Build ........... 2021.3.0-2787-60059f2c755-releases/2021/3
Pose Detection model - Reading network files:
        /workdir/models/movenet_lightning_FP32.xml
        /workdir/models/movenet_lightning_FP32.bin
Traceback (most recent call last):
  File "MovenetOpenvino.py", line 569, in <module>
    output=args.output)
  File "MovenetOpenvino.py", line 99, in __init__
    self.load_model(xml, device)
  File "MovenetOpenvino.py", line 131, in load_model
    self.pd_net = self.ie.read_network(model=xml_path, weights=bin_path)
  File "ie_api.pyx", line 293, in openvino.inference_engine.ie_api.IECore.read_network
  File "ie_api.pyx", line 315, in openvino.inference_engine.ie_api.IECore.read_network
RuntimeError: Check 'element::Type::merge(inputs_et, inputs_et, get_input_element_type(i))' failed at core/src/op/concat.cpp:62:
While validating node 'v0::Concat Concat_1866 (stack_2_StatefulPartitionedCall/stack_2_1/Unsqueeze/Output_0/Data__const[0]:i32{1,1}, stack_2_StatefulPartitionedCall/stack_2_1/Unsqueeze503[0]:i64{1,1}, stack_2_StatefulPartitionedCall/stack_2_1/Unsqueeze505[0]:i64{1,1}) -> ()' with friendly_name 'Concat_1866':
Argument element types are inconsistent.

PINTO0309 commented 3 years ago

You are the 100th issue contributor to be celebrated. :sweat_smile:

I was aware of the problem you pointed out while I was converting the model while eating lunch. In fact, I have also identified a way to solve that part of the problem. However, after this problem is solved, another major problem arises that cannot be helped.

TensorFlow's FloorDiv operation cannot be handled correctly by OpenVINO. Screenshot 2021-05-18 18:57:03

This is a known issue that only I see as a problem.

PINTO0309 commented 3 years ago

I tried to reconvert lightning using a special trick.

MoveNet lightning https://drive.google.com/file/d/1PbGaaSHw2FfWeWcAl-h3eB_mRf-PTogM/view?usp=sharing

PINTO0309 commented 3 years ago

MoveNet thunder https://drive.google.com/file/d/1LMdx_n4qLiJmPG45Yd2baG_B2H6A1ipa/view?usp=sharing

geaxgx commented 3 years ago

I feel very honoured and proud to be the 100th issue contributor ! A lot of hard work to get there :-))

Thank you for the "special trick" models. I can load them successfully. Now I have to read a bit the documentation to know how to decode the outputs.

geaxgx commented 3 years ago

Hmm that's strange, I get negative numbers in the model ouput ("Identity"): kps [[ 0.33809385 -0.3008902 0.86417985] [ 0.35239205 -0.28053764 0.7577548 ] [ 0.34184605 -0.28763872 0.68992126] [ 0.41102067 -0.2875655 0.8167058 ] [ 0.38617188 -0.29239017 0.7693529 ] [ 0.49646014 -0.34640452 0.8592416 ] [ 0.37429118 -0.39201987 0.8351665 ] [ 0.62626785 -0.34436777 0.5662274 ] [ 0.26715785 -0.29554045 0.6619134 ] [ 0.6390584 -0.3938337 0.5435367 ] [ 0.19667952 -0.21501386 0.6472453 ] [ 0.63254285 -0.49191946 0.91860545] [ 0.58395 0.41719258 0.93589425] [ 0.69131416 -0.29929492 0.94660014] [ 0.751432 0.30486852 0.939826 ] [ 0.8990828 -0.29772794 0.9320587 ] [ 0.8820982 0.2047543 0.9341732 ]]

From the model card:

Inputs
A frame of video or an image, represented as an int32 tensor of shape: 192x192x3. Channels order: RGB with values in [0, 255].

Outputs
A float32 tensor of shape [1, 1, 17, 3].

The first two channels of the last dimension represents the yx coordinates (normalized to image frame, i.e. range in [0.0, 1.0]) of the 17 keypoints (in the order of: [nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]).

The third channel of the last dimension represents the prediction confidence scores of each keypoint, also in the range [0.0, 1.0].

I need to check with you if your converted model is expecting as input float32 between [0,1] or between [0,255] ?

PINTO0309 commented 3 years ago

I do not specify any normalization values in the parameters during the conversion.

PINTO0309 commented 3 years ago

Hmmm. I did add a customization to change the INPUT type from INT32 to Float32. But other than that, I didn't modify the OP this time.

geaxgx commented 3 years ago

Ok thanks. So the model is expecting inputs between [0,255]. And the numbers I got in my previous post correspond to such input. Note that the first number of each triplet corresponds to the y coordinate and seems coherent with the input image.

For instance, the highest keypoint in the image is the right wrist and corresponds to the lowest y 0.19667952. But I don't really know how to interpret the 2nd channel with these negative numbers.

PINTO0309 commented 3 years ago

The sample program doesn't seem to be doing any special processing. The only thing that bothers me is the statement that the aspect ratio needs to be maintained. https://www.tensorflow.org/hub/tutorials/movenet

#@title Helper functions for visualization

# Dictionary that maps from joint names to keypoint indices.
KEYPOINT_DICT = {
    'nose': 0,
    'left_eye': 1,
    'right_eye': 2,
    'left_ear': 3,
    'right_ear': 4,
    'left_shoulder': 5,
    'right_shoulder': 6,
    'left_elbow': 7,
    'right_elbow': 8,
    'left_wrist': 9,
    'right_wrist': 10,
    'left_hip': 11,
    'right_hip': 12,
    'left_knee': 13,
    'right_knee': 14,
    'left_ankle': 15,
    'right_ankle': 16
}

# Maps bones to a matplotlib color name.
KEYPOINT_EDGE_INDS_TO_COLOR = {
    (0, 1): 'm',
    (0, 2): 'c',
    (1, 3): 'm',
    (2, 4): 'c',
    (0, 5): 'm',
    (0, 6): 'c',
    (5, 7): 'm',
    (7, 9): 'm',
    (6, 8): 'c',
    (8, 10): 'c',
    (5, 6): 'y',
    (5, 11): 'm',
    (6, 12): 'c',
    (11, 12): 'y',
    (11, 13): 'm',
    (13, 15): 'm',
    (12, 14): 'c',
    (14, 16): 'c'
}

def _keypoints_and_edges_for_display(keypoints_with_scores,
                                     height,
                                     width,
                                     keypoint_threshold=0.11):
  """Returns high confidence keypoints and edges for visualization.

  Args:
    keypoints_with_scores: A numpy array with shape [1, 1, 17, 3] representing
      the keypoint coordinates and scores returned from the MoveNet model.
    height: height of the image in pixels.
    width: width of the image in pixels.
    keypoint_threshold: minimum confidence score for a keypoint to be
      visualized.

  Returns:
    A (keypoints_xy, edges_xy, edge_colors) containing:
      * the coordinates of all keypoints of all detected entities;
      * the coordinates of all skeleton edges of all detected entities;
      * the colors in which the edges should be plotted.
  """
  keypoints_all = []
  keypoint_edges_all = []
  edge_colors = []
  num_instances, _, _, _ = keypoints_with_scores.shape
  for idx in range(num_instances):
    kpts_x = keypoints_with_scores[0, idx, :, 1]
    kpts_y = keypoints_with_scores[0, idx, :, 0]
    kpts_scores = keypoints_with_scores[0, idx, :, 2]
    kpts_absolute_xy = np.stack(
        [width * np.array(kpts_x), height * np.array(kpts_y)], axis=-1)
    kpts_above_thresh_absolute = kpts_absolute_xy[
        kpts_scores > keypoint_threshold, :]
    keypoints_all.append(kpts_above_thresh_absolute)

    for edge_pair, color in KEYPOINT_EDGE_INDS_TO_COLOR.items():
      if (kpts_scores[edge_pair[0]] > keypoint_threshold and
          kpts_scores[edge_pair[1]] > keypoint_threshold):
        x_start = kpts_absolute_xy[edge_pair[0], 0]
        y_start = kpts_absolute_xy[edge_pair[0], 1]
        x_end = kpts_absolute_xy[edge_pair[1], 0]
        y_end = kpts_absolute_xy[edge_pair[1], 1]
        line_seg = np.array([[x_start, y_start], [x_end, y_end]])
        keypoint_edges_all.append(line_seg)
        edge_colors.append(color)
  if keypoints_all:
    keypoints_xy = np.concatenate(keypoints_all, axis=0)
  else:
    keypoints_xy = np.zeros((0, 17, 2))

  if keypoint_edges_all:
    edges_xy = np.stack(keypoint_edges_all, axis=0)
  else:
    edges_xy = np.zeros((0, 2, 2))
  return keypoints_xy, edges_xy, edge_colors

def draw_prediction_on_image(
    image, keypoints_with_scores, crop_region=None, close_figure=False,
    output_image_height=None):
  """Draws the keypoint predictions on image.

  Args:
    image: A numpy array with shape [height, width, channel] representing the
      pixel values of the input image.
    keypoints_with_scores: A numpy array with shape [1, 1, 17, 3] representing
      the keypoint coordinates and scores returned from the MoveNet model.
    crop_region: A dictionary that defines the coordinates of the bounding box
      of the crop region in normalized coordinates (see the init_crop_region
      function below for more detail). If provided, this function will also
      draw the bounding box on the image.
    output_image_height: An integer indicating the height of the output image.
      Note that the image aspect ratio will be the same as the input image.

  Returns:
    A numpy array with shape [out_height, out_width, channel] representing the
    image overlaid with keypoint predictions.
  """
  height, width, channel = image.shape
  aspect_ratio = float(width) / height
  fig, ax = plt.subplots(figsize=(12 * aspect_ratio, 12))
  # To remove the huge white borders
  fig.tight_layout(pad=0)
  ax.margins(0)
  ax.set_yticklabels([])
  ax.set_xticklabels([])
  plt.axis('off')

  im = ax.imshow(image)
  line_segments = LineCollection([], linewidths=(4), linestyle='solid')
  ax.add_collection(line_segments)
  # Turn off tick labels
  scat = ax.scatter([], [], s=60, color='#FF1493', zorder=3)

  (keypoint_locs, keypoint_edges,
   edge_colors) = _keypoints_and_edges_for_display(
       keypoints_with_scores, height, width)

  line_segments.set_segments(keypoint_edges)
  line_segments.set_color(edge_colors)
  if keypoint_edges.shape[0]:
    line_segments.set_segments(keypoint_edges)
    line_segments.set_color(edge_colors)
  if keypoint_locs.shape[0]:
    scat.set_offsets(keypoint_locs)

  if crop_region is not None:
    xmin = max(crop_region['x_min'] * width, 0.0)
    ymin = max(crop_region['y_min'] * height, 0.0)
    rec_width = min(crop_region['x_max'], 0.99) * width - xmin
    rec_height = min(crop_region['y_max'], 0.99) * height - ymin
    rect = patches.Rectangle(
        (xmin,ymin),rec_width,rec_height,
        linewidth=1,edgecolor='b',facecolor='none')
    ax.add_patch(rect)

  fig.canvas.draw()
  image_from_plot = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
  image_from_plot = image_from_plot.reshape(
      fig.canvas.get_width_height()[::-1] + (3,))
  plt.close(fig)
  if output_image_height is not None:
    output_image_width = int(output_image_height / height * width)
    image_from_plot = cv2.resize(
        image_from_plot, dsize=(output_image_width, output_image_height),
         interpolation=cv2.INTER_CUBIC)
  return image_from_plot

def to_gif(images, fps):
  """Converts image sequence (4D numpy array) to gif."""
  imageio.mimsave('./animation.gif', images, fps=fps)
  return embed.embed_file('./animation.gif')

def progress(value, max=100):
  return HTML("""
      <progress
          value='{value}'
          max='{max}',
          style='width: 100%'
      >
          {value}
      </progress>
  """.format(value=value, max=max))

# Resize and pad the image to keep the aspect ratio and fit the expected size.
input_image = tf.expand_dims(image, axis=0)
input_image = tf.cast(tf.image.resize_with_pad(
    input_image, input_size, input_size), dtype=tf.int32)

# Run model inference.
outputs = movenet(input_image)
# Output is a [1, 1, 17, 3] tensor.
keypoint_with_scores = outputs['output_0']

# Visualize the predictions with image.
display_image = tf.expand_dims(image, axis=0)
display_image = tf.cast(tf.image.resize_with_pad(
    display_image, 1280, 1280), dtype=tf.int32)
output_overlay = draw_prediction_on_image(
    np.squeeze(display_image.numpy(), axis=0), keypoint_with_scores)

plt.figure(figsize=(5, 5))
plt.imshow(output_overlay)
_ = plt.axis('off')

PINTO0309 commented 3 years ago

I need to do some research to see if the problem came up when I converted to ONNX. Screenshot 2021-05-18 22:09:43

geaxgx commented 3 years ago

This is what I get on their colab demo with my sample image:

tf.Tensor(
[[[[0.32252783 0.6974002  0.87823004]
   [0.3368386  0.7183701  0.73135835]
   [0.32875112 0.7149483  0.53043044]
   [0.39574257 0.7109653  0.83786   ]
   [0.373867   0.7076759  0.7852564 ]
   [0.4851149  0.65141076 0.8171427 ]
   [0.3588924  0.60673237 0.8145416 ]
   [0.60513955 0.65146786 0.500903  ]
   [0.25296256 0.70648587 0.5550392 ]
   [0.6227146  0.60486215 0.51756746]
   [0.18121433 0.7854103  0.62142277]
   [0.6179341  0.5054677  0.8370107 ]
   [0.58052903 0.41632038 0.7957091 ]
   [0.67267233 0.70280856 0.90661496]
   [0.75072277 0.3047437  0.84694517]
   [0.879561   0.7038015  0.9044996 ]
   [0.8851525  0.2067561  0.9440259 ]]]], shape=(1, 1, 17, 3), dtype=float32)

So for y, we have similar values. For x, if I compare my_x with their_x, it looks like: if their_x < 0.5, then my_x = their_x if their_x > 0.5 then my_x = their_x - 1

PINTO0309 commented 3 years ago

This is the result of reasoning with a tflite model that I just changed the input type to Float32 before converting to ONNX.

array([[[[0.32448542, 0.68699664, 0.64204913],
         [0.33508152, 0.7094843 , 0.55379134],
         [0.32651505, 0.7030307 , 0.437988  ],
         [0.3908338 , 0.7020813 , 0.6667765 ],
         [0.36634612, 0.703608  , 0.5485987 ],
         [0.4885859 , 0.6513269 , 0.5774252 ],
         [0.36186764, 0.6106034 , 0.53571534],
         [0.62881696, 0.6714854 , 0.29908276],
         [0.23611161, 0.7139294 , 0.47521845],
         [0.62566906, 0.60845155, 0.45314837],
         [0.17534202, 0.78672737, 0.822638  ],
         [0.603142  , 0.5127338 , 0.67058015],
         [0.5701561 , 0.41531724, 0.82032025],
         [0.6790442 , 0.6913149 , 0.9012103 ],
         [0.7507467 , 0.30132815, 0.8377824 ],
         [0.87941265, 0.70109427, 0.84762025],
         [0.8852422 , 0.19999856, 0.74576056]]]], dtype=float32)

geaxgx commented 3 years ago

Looks great !

PINTO0309 commented 3 years ago

So it looks like something is going on here via ONNX or OpenVINO optimizer.

gespona commented 3 years ago

Hi ! Maybe better to open new issue but I'm trying to get running MoveNet on MyriadX (OAK device) but without any success

[14442C10A119CBD200] [232.129] [NeuralNetwork(2)] [warning] Input image (192x192) does not match NN (3x192) - skipping inference [14442C10A119CBD200] [232.129] [NeuralNetwork(2)] [error] Input tensor 'input:0' (0) exceeds available data range. Data size (110592B), tensor offset (0), size (221184B) - skipping inference

Maybe is expecting 3x192x192 instead of 192x192x3 ? I'm just following the info on the model card.

I know @geaxgx has experience with OAK Devices ... Could you point me in the right direction? Any advice?

I tried to convert myself the tflite model but after generating IR model I'm getting "Argument element types are inconsistent" for 'Concat_1876' (?)

geaxgx commented 3 years ago

Hi @gespona, @PINTO0309 has done the conversion of the models but there is still a small problem with the output values. PINTO is working on it. Just be patient.

PINTO0309 commented 3 years ago

Inference results for the ONNX file just before converting to OpenVINO. Almost OK.

[array([[[[0.32448548, 0.6869966 , 0.6420484 ],
         [0.33508155, 0.7094843 , 0.5537915 ],
         [0.32651508, 0.7030306 , 0.4379869 ],
         [0.39083382, 0.7020813 , 0.6667766 ],
         [0.36634612, 0.703608  , 0.5485983 ],
         [0.4885859 , 0.65132695, 0.577426  ],
         [0.3618676 , 0.6106034 , 0.53571665],
         [0.6288169 , 0.67148536, 0.29908305],
         [0.23611161, 0.7139294 , 0.47521886],
         [0.6256691 , 0.6084515 , 0.45314914],
         [0.17534202, 0.78672737, 0.82263756],
         [0.603142  , 0.5127338 , 0.670581  ],
         [0.57015616, 0.41531718, 0.8203208 ],
         [0.6790442 , 0.6913149 , 0.90121067],
         [0.7507467 , 0.30132815, 0.83778256],
         [0.87941265, 0.70109427, 0.84762037],
         [0.8852422 , 0.19999854, 0.7457607 ]]]], dtype=float32)]

PINTO0309 commented 3 years ago

OpenVINO FP32

array([[[[0.32448542, 0.6869966 , 0.6420481 ],
         [0.33508152, 0.7094842 , 0.55379057],
         [0.32651508, 0.7030306 , 0.43798667],
         [0.39083382, 0.7020813 , 0.6667762 ],
         [0.36634612, 0.703608  , 0.5485983 ],
         [0.48858595, 0.651327  , 0.5774265 ],
         [0.3618676 , 0.6106034 , 0.53571504],
         [0.62881696, 0.6714853 , 0.29908296],
         [0.23611164, 0.7139294 , 0.47521782],
         [0.6256691 , 0.6084515 , 0.45314965],
         [0.17534205, 0.7867273 , 0.82263744],
         [0.60314196, 0.5127338 , 0.6705806 ],
         [0.57015604, 0.41531724, 0.8203208 ],
         [0.6790442 , 0.6913149 , 0.90121037],
         [0.7507467 , 0.30132818, 0.8377828 ],
         [0.87941265, 0.70109427, 0.8476199 ],
         [0.88524216, 0.19999859, 0.7457614 ]]]], dtype=float32)

PINTO0309 commented 3 years ago

OpenVINO FP16

array([[[[0.3242579 , 0.6867883 , 0.63874394],
         [0.334914  , 0.709219  , 0.55096024],
         [0.326399  , 0.7028601 , 0.4379415 ],
         [0.3907204 , 0.7018955 , 0.66515774],
         [0.3662796 , 0.70349765, 0.54623103],
         [0.48843265, 0.65106046, 0.577307  ],
         [0.3616504 , 0.61051846, 0.5368742 ],
         [0.62867224, 0.6713548 , 0.30035877],
         [0.23601574, 0.7138525 , 0.47763693],
         [0.6255478 , 0.6083473 , 0.4535727 ],
         [0.17515807, 0.7865279 , 0.8231634 ],
         [0.6029311 , 0.512741  , 0.67055583],
         [0.57002366, 0.41531637, 0.8211969 ],
         [0.6787125 , 0.69121534, 0.90212846],
         [0.75059325, 0.3013085 , 0.83856463],
         [0.8791874 , 0.70098466, 0.8479991 ],
         [0.88505656, 0.19993334, 0.74572027]]]], dtype=float32)

PINTO0309 commented 3 years ago

I'm getting tired... :crying_cat_face:

geaxgx commented 3 years ago

What time is it in Japan ? 0:30 ? Time to go to sleep Katsuya ! You fully deserve it ! We are not in hurry.

geaxgx commented 3 years ago

Anyway, myself, I won't have time to test this evening :-)

PINTO0309 commented 3 years ago

Done. Screenshot 2021-05-19 00:22:17

PINTO0309 commented 3 years ago

Thanks for worrying about me! My shoulders are just stiff from working from home and being on the computer all day. It's 00:30 in Japan. But I'm not sleepy yet, so I'll commit the modified ONNX, OpenVINO IR and Blob, then smoke a cigarette and go to bed.

gespona commented 3 years ago

Sorry if I sounded impatient .... Ofc we're not in hurry. Thanks a lot for all this amazing work :)

geaxgx commented 3 years ago

Ha ha enjoy the cigarette ! Thanks again. I don't know how we would do if you did not exist :-)

PINTO0309 commented 3 years ago

ONNX and OpenVINO IR and Myriad Inference Engine Blobs have been updated. I leave the rest of the verification to @geaxgx and @gespona.

When converting from tflite to ONNX, the values seem to shift slightly. I'm going to bed. Good night.

gespona commented 3 years ago

Thanks a lot ! Going to try again with Myriad

geaxgx commented 3 years ago

Thanks Katsuya, I will try tomorrow.

geaxgx commented 3 years ago

That's strange. I have just done a quick test after using 115_MoveNet/download_thunder.sh to download the models. I still get the negative numbers in the output. Am I using the very last models ?

gespona commented 3 years ago

On my side similar .. I'm getting same errors after downloading again .. even I'm not sure if it's related to the same issue

geaxgx commented 3 years ago

@gespona I think your problem is another issue. I remember that the message [NeuralNetwork(2)] [warning] Input image (192x192) does not match NN (3x192) - skipping inference was a bug in an older version which has been fixed. But at that time, it was an error, whereas here it is a warning. I don't have time to investigate tonight but will do tomorrow.

On my side, I want first to make it work on Openvino. My problem of negative values is not really blocking because the negative values can be fixed by added 1 to them.

gespona commented 3 years ago

@geaxgx Yes I agree. In fact the warning is happening when feeding directly the NN node with camera. I tried feed with XLinkIn node and warning is not appearing (?) :)

But error is always there: [14442C10A119CBD200] [232.129] [NeuralNetwork(2)] [error] Input tensor 'input:0' (0) exceeds available data range. Data size (110592B), tensor offset (0), size (221184B) - skipping inference

But I'm totally lost ... 110592 matches with 192 192 3 but no clue about the other size .... so I'd really appreciate your feedback when you have time to check. Thanks!

geaxgx commented 3 years ago

221184 = 2 * 110592 Maybe the input data you are sending is not of the correct type ?

gespona commented 3 years ago

That's a good point. Checking again ... the model card says input is int32

gespona commented 3 years ago

No luck for now ... but I found similar issue (input tensor) reported here https://github.com/luxonis/depthai/issues/366 ... but not clear yet ... maybe it's issue more related to oak device then (?)

PINTO0309 commented 3 years ago

@gespona Earlier in this issue, I mentioned that I changed the type of the input from INT32 to FLOAT32 to quantize the model. Use Netron, a web site that allows you to visualize the structure of your model, to see the structure expected of the input. I am not very familiar with what the error message means, but the caveat is that I am converting the Float16 (FP16) model to Blob. I made a simple prediction because the offset value in the error message is double the value. A float16 precision model tagged with model_float32. I have customized the model to be optimized for Float32 and Float16, INT8, so please ignore the description on the model card. INT32 is an unwieldy and unfriendly model for many users of the model. https://netron.app/

openvino/FP16/movenet_singlepose_lightning_3.xml

Screenshot 2021-05-19 08:51:54

openvino/FP16/movenet_singlepose_thunder_3.xml

Screenshot 2021-05-19 08:52:28

The conversion command I used is below. I specified FP16 for the conversion, but will the result be the same if I reconvert using FP32? If there is a problem, it is a problem beyond my control.

${INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/myriad_compile \
-m openvino/FP16/movenet_singlepose_lightning_3.xml \
-VPU_NUMBER_OF_SHAVES 4 \
-VPU_NUMBER_OF_CMX_SLICES 4 \
-o openvino/myriad/movenet_singlepose_lightning_3.blob

@geaxgx I downloaded the latest Google Drive file again this morning, which I thought I had uploaded last night, and tested it again in a separate working folder to make sure there were no mistakes.

#### lightning
$ sudo gdown --id 1Fkh3N5fhyvrkWBF-9X7FhsN6YveFFu_O
$ tar -zxvf resources.tar.gz
$ mv resources resources1

#### thunder
$ sudo gdown --id 1RDF35KcL7kWRb4dgRf0OudH6l0EtZ3qw
$ tar -zxvf resources.tar.gz
$ mv resources resources2

test.png

test_onnx.py


import onnx
import onnxruntime
import numpy as np
import cv2
import pprint

model_path = "model_float32.onnx"

def main(): model = onnx.load(model_path) onnx.checker.check_model(model)

sess = onnxruntime.InferenceSession(model_path)

image = cv2.imread('test.png')
frame = cv2.resize(image, (256, 256))
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = np.expand_dims(frame, axis=0)
frame = frame.astype('float32')

inputs = {sess.get_inputs()[0].name: frame}
outputs = sess.run(None, inputs)

pprint.pprint(outputs)

if name == "main": main()

[array([[[[0.32433048, 0.69931823, 0.8797674 ], [0.3375441 , 0.71945375, 0.7601149 ], [0.32876045, 0.7161203 , 0.5944811 ], [0.39609817, 0.7126299 , 0.82234097], [0.37236294, 0.70911103, 0.8054341 ], [0.48610994, 0.6517349 , 0.85130155], [0.35770214, 0.60963666, 0.8522695 ], [0.60724556, 0.6535827 , 0.4368128 ], [0.2557739 , 0.7068589 , 0.61313367], [0.6248696 , 0.6063015 , 0.55172634], [0.18469065, 0.78301877, 0.52691185], [0.61794114, 0.5086788 , 0.9210715 ], [0.58055407, 0.41669005, 0.8049598 ], [0.67506 , 0.7038542 , 0.91628826], [0.7518556 , 0.30507305, 0.834178 ], [0.88219225, 0.7037958 , 0.90368426], [0.88579935, 0.20647278, 0.9427309 ]]]], dtype=float32)]


- test_openvino.py
```python
from openvino.inference_engine import IECore
import numpy as np
import cv2
import pprint

XML_PATH = "openvino/FP16/movenet_singlepose_thunder_3.xml"
BIN_PATH = "openvino/FP16/movenet_singlepose_thunder_3.bin"

ie = IECore()
net = ie.read_network(model=XML_PATH, weights=BIN_PATH)
input_blob = next(iter(net.input_info))
exec_net = ie.load_network(net, device_name='CPU', num_requests=1)
inference_request = exec_net.requests[0]

img = cv2.imread('test.png')
img = cv2.resize(img, (256, 256))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.asarray(img)
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]

exec_net.infer(inputs={input_blob: img})
pprint.pprint(inference_request.output_blobs)
output = inference_request.output_blobs['Identity'].buffer

pprint.pprint(output)

{'7022.0': <openvino.inference_engine.ie_api.Blob object at 0x7f4f6bebce80>,
 '7026.0': <openvino.inference_engine.ie_api.Blob object at 0x7f4f737caa80>,
 'Identity': <openvino.inference_engine.ie_api.Blob object at 0x7f4f6bebce40>}
array([[[[0.32441464, 0.69939005, 0.879445  ],
         [0.33760712, 0.71948093, 0.7585466 ],
         [0.32884556, 0.7161406 , 0.59659934],
         [0.3961212 , 0.7126963 , 0.8225935 ],
         [0.37242952, 0.7091924 , 0.80385953],
         [0.48617375, 0.65181863, 0.8533465 ],
         [0.35786432, 0.6097264 , 0.85393965],
         [0.6126379 , 0.65634876, 0.38123205],
         [0.25588638, 0.7068643 , 0.6161008 ],
         [0.62492216, 0.6062361 , 0.54878634],
         [0.18464863, 0.7830283 , 0.52940834],
         [0.6179626 , 0.50863874, 0.9211578 ],
         [0.5806241 , 0.41676882, 0.80573714],
         [0.67509186, 0.7039062 , 0.91644865],
         [0.75190806, 0.30513576, 0.8353069 ],
         [0.8821908 , 0.7038179 , 0.90322673],
         [0.88581234, 0.20652623, 0.9427294 ]]]], dtype=float32)

test_tflite.py


import numpy as np
import time
import tensorflow.lite as tflite
import cv2
import sys

interpreter = tflite.Interpreter(model_path='model_float32.tflite', num_threads=4) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()

image = cv2.cvtColor(cv2.imread('test.png'), cv2.COLOR_BGR2RGB) image = cv2.resize(image, (256, 256)) image = np.expand_dims(image, axis=0) image = image.astype('float32')

interpreter.set_tensor(input_details[0]['index'], image) start_time = time.time() interpreter.invoke() stop_time = time.time() print("time: ", stop_time - start_time)

scores = interpreter.get_tensor(output_details[0]['index'])

import pprint pprint.pprint(scores)

time: 0.03680682182312012 array([[[[0.32433048, 0.69931823, 0.8797673 ], [0.3375441 , 0.71945375, 0.7601152 ], [0.32876042, 0.7161203 , 0.59448147], [0.3960982 , 0.7126299 , 0.8223407 ], [0.3723629 , 0.7091111 , 0.805434 ], [0.48610994, 0.6517349 , 0.8513019 ], [0.3577021 , 0.60963666, 0.8522699 ], [0.6072456 , 0.6535827 , 0.43681395], [0.25577393, 0.7068589 , 0.6131334 ], [0.62486964, 0.6063015 , 0.55172515], [0.18469064, 0.78301877, 0.5269122 ], [0.61794114, 0.50867885, 0.92107165], [0.58055407, 0.41669008, 0.8049605 ], [0.67506003, 0.7038542 , 0.91628814], [0.7518556 , 0.30507302, 0.8341775 ], [0.88219225, 0.70379585, 0.9036844 ], [0.8857994 , 0.20647278, 0.9427308 ]]]], dtype=float32)

geaxgx commented 3 years ago

Thanks for the updated links and sorry for the trouble. I just understood where the negative values are coming from. My bad. It is because the Openvino runtime I used for the tests were the 2021.2 instead of 2021.3 :-(

geaxgx commented 3 years ago

Thunder with openvino on CPU :

geaxgx commented 3 years ago

Thunder with openvino on Myriad: The skeleton is shifted.

geaxgx commented 3 years ago

Ligthning with Openvino on CPU:

geaxgx commented 3 years ago

Lightning with Openvino on Myriad: Same problem of shifting. Do we get the same problem of precision on float16 we had with blazepose heavy ?

PINTO0309 commented 3 years ago

@geaxgx What happens when you use this? FP16 lightning / thunder .xml The bad result is different from that of BlazePose. This time it looks like it is simply shifted to the upper left.

movenet_singlepose_lightning_3.zip movenet_singlepose_thunder_3.zip

geaxgx commented 3 years ago

Ha ha much better !

PINTO0309 commented 3 years ago

That's what I expected.

geaxgx commented 3 years ago

And for thunder on myriad :

geaxgx commented 3 years ago

You are a genius Katsuya ! align_corners again ? It looks like there is just face landmarks that are shifted a bit.

geaxgx commented 3 years ago

About the problem of @gespona, what arguments are you using in the compile_tool command to generate the blob ? -ip U8 ? -ip FP 16 ?

PINTO0309 commented 3 years ago

align_corners again ?

Yes. That's right.

It looks like there is just face landmarks that are shifted a bit.

Originally, when I looked at the FP16 values, I found that there was a slight error with the FP32 values, perhaps due to the loss of important facial features when scaling down to the 192x192 or 256x256 sizes.

-ip U8 ? -ip FP 16 ?

I have not specified anything.

${INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/myriad_compile \
-m openvino/FP16/movenet_singlepose_lightning_3.xml \
-VPU_NUMBER_OF_SHAVES 4 \
-VPU_NUMBER_OF_CMX_SLICES 4 \
-o openvino/myriad/movenet_singlepose_lightning_3.blob

geaxgx commented 3 years ago

I personally think that face landmarks are not very important when doing body pose recognition. But movenet seems pretty good for the skeleton. I have to compare with blazepose on challenging poses.

If you don't specify '-ip' argument, I guess it will use FP16. In that case, @gespona should probably use cam.setFp16(True) when using the internal camera on depthai. I will check later on.

PINTO0309 / PINTO_model_zoo