OpenVisualCloud / Smart-City-Sample

The smart city reference pipeline shows how to integrate various media building blocks, with analytics powered by the OpenVINO™ Toolkit, for traffic or stadium sensing, analytics and management tasks.
BSD 3-Clause "New" or "Revised" License
186 stars 80 forks source link

Accuracy Decreases Tremendously after IR conversion of model #792

Closed divdaisymuffin closed 2 years ago

divdaisymuffin commented 2 years ago

Hi @nnshah1 and @xwu2git,

Need Your help with, We have trained a head detection model on yolov3 and tested it on opencv, Where we have recieved good accuracy. Video is attached: https://user-images.githubusercontent.com/68895288/136757909-cf913224-6b02-4e0f-adee-b3f95b06c33a.mp4

Later we have converted the Model to IR using provided documents via intel Conversion When we tested on same video accuracy decreased drastically. Please see the video and model proc. Video: https://user-images.githubusercontent.com/68895288/136762446-27935935-1a31-4d78-9be0-6f45451dba97.mp4

modelproc:

{ "json_schema_version": "2.0.0", "input_preproc": [], "output_postproc": [ { "converter": "tensor_to_bbox_yolo_v3", "iou_threshold": 0.5, "classes": 1, "anchors": [ 10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0 ], "masks": [ 6, 7, 8, 3, 4, 5, 0, 1, 2 ], "bbox_number_on_cell": 3, "cells_number": 13, "labels": ["person"] } ] }

We tried another approach,where we have converted the Yolov3 to TF separately and then frozen the model and converted to IR format. But surprisingly, we are not able to get detection, model proc you can check. Model Proc: Proc File: { "json_schema_version": "2.0.0", "input_preproc": [], "output_postproc": [ { "labels": ["person"], "converter": "tensor_to_bbox_yolo_v3" } ] }

So we want to know, Why there is so drastic decrease in accuracy? Did we missed something while conversion and in second approach where we have converted yolo to tf externally, why we are getting no detection? is model proc is a problem?

nnshah1 commented 2 years ago

@divdaisymuffin Can you share more details on how the model was created? The link points to a conversion step for tiny-yolov3?

Did you follow these instructions on creating the model: https://qiita.com/PINTO/items/7dd7135085a7249bf17a#support-for-local-training-and-openvino-of-one-class-tiny-yolov3-with-a-proprietary-data-set ?

At first blush the model proc for the two versions would be the same independent of the way the model was converted to IR.

Is it possible to share the final model (doesn't have to be the exact model but a similar one) and we can look in more detail on the model proc settings?

Thanks!

divdaisymuffin commented 2 years ago

@nnshah1 , Thanks, We will get back to you with all information

divdaisymuffin commented 2 years ago

Hi @nnshah1 , We have tried your suggested method as well, but still accuracy is degraded only. I am providing you the link where you can find the IR converted model, and original model file.

https://drive.google.com/drive/folders/1_nHqMxvlxuFhpJBEma3uKZPBo6imKqXl

vidyasiv commented 2 years ago

Hi @divdaisymuffin, thanks for providing the model information, could you share the original video used i.e without watermarking?

divdaisymuffin commented 2 years ago

@vidyasiv yes sure, you can find the video here: https://drive.google.com/drive/u/0/mobile/folders/1_nHqMxvlxuFhpJBEma3uKZPBo6imKqXl/1X25H8kWup7y6ihX6B1Dqy5ZyUXQ6rhc9?sort=13&direction=a

divdaisymuffin commented 2 years ago

@vidyasiv Can you guide us further?

divdaisymuffin commented 2 years ago

@vidyasiv Can you guide us further?

vidyasiv commented 2 years ago

@divdaisymuffin , there's a mismatch between the color space of input media of your model (RGB) and the one used by the inference engine (BGR). There are two options to resolve this:

  {
  "json_schema_version": "2.0.0",
  "input_preproc": [{
    "format": "image",
    "layer_name": "inputs",
    "params": {
      "resize": "no-aspect-ratio",
      "color_space": "RGB"
    }}
  ],
  "output_postproc": [
    {
      "converter": "tensor_to_bbox_yolo_v3",
      "iou_threshold": 0.5,
      "classes": 1,
      "anchors": [
        10.0,
        13.0,
        16.0,
        30.0,
        33.0,
        23.0,
        30.0,
        61.0,
        62.0,
        45.0,
        59.0,
        119.0,
        116.0,
        90.0,
        156.0,
        198.0,
        373.0,
        326.0
      ],
      "masks": [
        6,
        7,
        8,
        3,
        4,
        5,
        0,
        1,
        2
      ],
      "bbox_number_on_cell": 3,
      "cells_number": 13,
      "labels": [
        "person"
      ]
    }
  ]
 }

Option1 is recommended for better performance as it would avoid opencv color space conversion of RGB->BGR

Below is a video sample using Option 2 which shows accuracy similar to original model: https://user-images.githubusercontent.com/81709031/139144051-8b43ba8b-5685-451f-8219-02af48fb8a6d.mp4

divdaisymuffin commented 2 years ago

Thank you so much @vidyasiv for putting so much efforts, we will try the same for our conversion. Can you also provide us some guidelines like what things we need to check if somehow we see mismatch accuracy between original and IR converted models, It will be very useful for us in future cases.

divdaisymuffin commented 2 years ago

hi @vidyasiv we have tried the options suggested by you, but still we cant find improvement in the accuracy. I think we have followed a different path for conversion than you. I am sharing the configuration we have used and video output as well

python3 mo.py --input_model --model_name -s 255 --input_shape [1,416,416,3] --data_type FP32 --output "" --reverse_input_channels --disable_weights_compression

[video link](video link: https://drive.google.com/drive/u/2/folders/1IXzW6BXdrG3gE0b-w3eFwob01MxK5ubQ)

nnshah1 commented 2 years ago

@divdaisymuffin Can you confirm that changing the model-proc gets you the expected results?

nnshah1 commented 2 years ago

Thank you so much @vidyasiv for putting so much efforts, we will try the same for our conversion. Can you also provide us some guidelines like what things we need to check if somehow we see mismatch accuracy between original and IR converted models, It will be very useful for us in future cases.

Agreed - the best references for conversion itself will be the OpenVINO documentation such as with the DL Workbench (https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Introduction.html)

If OpenVINO tools report no drop in accuracy but you see a drop when bringing into VAS / DL Streamer then most likely the model-proc is mismatched (as in this case) indicating an error in pre-processing. The consideration is the output thresholds when in the pipeline.

divdaisymuffin commented 2 years ago

@nnshah1 yes after changing the model proc, got us better results on Vaserving.

divdaisymuffin commented 2 years ago

@nnshah1 @vidyasiv Thanks a lot for the prompt support.

nnshah1 commented 2 years ago

@divdaisymuffin Thanks for confirming!

In addition @vidyasiv spent some time following the conversion steps (as recommended to avoid the change to model proc) and was successful using the following resources and notes and achieve the expected accuracy and confirmed that reverse_input_channels was set.

Steps outlined here:

https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_YOLO_From_Tensorflow.html#convert-yolov3-model-to-ir

Notes on steps: