Yolo-v3-tiny-tf model with INT-8 precision gives bad inferences

AishaSamaanKhan commented 2 years ago

Hi We are working on integrating yolo-v3-tiny-tf int-8 IR model into the dlstreamer pipeline following the documentation provided for changing model We were able to integrate yolo-v3-tiny-tf IR model (non quantized) and test it. but we failed to get proper inference with the INT-8 models for the same. these converted INT-8 model were validated using open model zoo sample for object detection and it was giving a proper inference.

The steps follow for conversion of the yolo-v3-tiny-tf to int8 model are provided below:

This Quantization document is based on yolo_v3_tiny_tf model

Requirements

Openvino-dev 2022.1
Openvino 2022.1

Steps for Quantization

Step 1: Obtain the OMZ model (yolo_v3_tiny_tf)

 omz_downloader --name yolo_v3_tiny_tf
 omz_converter --name yolo_v3_tiny_tf

This step downloads the frozen model and converts them to it's appropriate IR representation.

Step 2 : Obtain the DataSet for Optimization

For this Model COCO 2017 Validation dataset was selected

wget http://images.cocodataset.org/zips/val2017.zip

unzip val2017.zip

Step 3: Create a Json (Optional instead of pot arguments)

Note: Use FP16 to convert to FP16-INT8

{

        "model": {
                "model_name": "yolo-v3-tiny-tf",
                "model": "<path to the model>/yolo-v3-tiny-tf/FP32/yolo-v3-tiny-tf.xml",
                "weights": "<path to the model>/yolo-v3-tiny-tf/FP32/yolo-v3-tiny-tf.bin"
        },

        "engine": {
            "type": "simplified",
                "data_source": "<path to the dataset where the images are stored>/val2017"
        },

        "compression": {
            "target_device": "CPU",
             "algorithms": [
                {
                "name": "DefaultQuantization",
                "params": {
                            "preset": "performance",
                            "stat_subset_size": 300,
                            "shuffle_data": false
                }
                }
        ]
        }
}

Step 4: Use post optimization tool of the openvino to finish the process

This step converts the FP32/FP16 models to its FP32-INT8/FP16-INT8 models

the INT8 models will be available in "yolov3_int8" directory

pot -c quantization_spec.json --output-dir yolov3_int8 -d

Step 5 : Validation

Test the converted model with open model zoo demo object detection sample.

 python3 object_detection_demo.py   -d CPU   -i <path to the input video>  -m <path to INT8 model xml>   -at yolo   --labels <OMZ_DIR>/data/dataset_classes/coco_80cl.txt

For integrating the pipeline server the steps followed are as per the document

copy the downloaded and converted model under

<pipeline-server>/models/object_detection/yolo-v3-tiny-tf

Directory structure looks something like this under yolo-v3-tiny-tf

coco-80cl.txt  FP16  FP32  FP32-INT8  yolo-v3-tiny-tf  yolo-v3-tiny-tf.json

Created new pipeline

cp -r pipelines/gstreamer/object_detection/person_vehicle_bike pipelines/gstreamer/object_detection/yolo-v3-tiny-tf

Edited the template of pipeline.json

sed -i -e s/\\[person_vehicle_bike\\]/\\[yolo-v3-tiny-tf\\]/g pipelines/gstreamer/object_detection/yolo-v3-tiny-tf/pipeline.json

Ran the pipeline server

./docker/run.sh -v /tmp:/tmp --models models --pipelines pipelines/gstreamer

with this we were able to the inferencing with FP16, FP32 but we were not able to do inference with FP32-INT8 IR model

Could you please let us know what steps we are missing to integrate Quantized model ?

Thanks

whbruce commented 2 years ago

If you change the precision you need to specify it in the pipeline as per Referencing Models in Pipeline Definitions. See updated pipeline definition entry for gvadetect.

gvadetect model={models[object_detection][yolo-v2-tiny-tf][FP32-INT8][network]} name=detection

AishaSamaanKhan commented 2 years ago

@whbruce Thank you for the quick response, I tried Updating the pipeline definition as suggested, Still the issue persist.

These are the logs on the pipeline server, I could see that it is using the INT-8 models

{"levelname": "DEBUG", "asctime": "2022-07-22 01:23:06,054", "message": "gst-launch-1.0 urisourcebin  uri=https://github.com/intel-iot-devkit/sample-videos/raw/master/bottle-detection.mp4?raw=true ! decodebin ! gvadetect  model=/home/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml model-instance-id=detection_d69badd6095c11ed91000242ac110002 model-proc=/home/pipeline-server/models/object_detection/yolo-v3-tiny-tf/yolo-v3-tiny-tf.json labels=/home/pipeline-server/models/object_detection/yolo-v3-tiny-tf/coco-80cl.txt ! gvametaconvert  source=https://github.com/intel-iot-devkit/sample-videos/raw/master/bottle-detection.mp4?raw=true ! gvametapublish  file-path=/tmp/results.jsonl file-format=json-lines ! appsink  sync=False emit-signals=True", "module": "gstreamer_pipeline"}

over the inferencing side keeps incorrectly predicting bench continuously for uri=https://github.com/intel-iot-devkit/sample-videos/raw/master/bottle-detection.mp4?raw=true

Timestamp 5832402234
- bench (0.55) [0.00, 0.00, 0.98, 1.00]

whbruce commented 2 years ago

Please run OMZ object detection demo to get a baseline for model accuracy.

AishaSamaanKhan commented 2 years ago

@whbruce These are the same model that i used for the pipeline server I have tried this the open model zoo sample openvino 2022.1.0 version, Below are the logs and attachment for the same

(setup) intel@intel-WL10:~/workspace/open_model_zoo/demos/object_detection_demo/python$ python3 object_detection_demo.py -d CPU -i bottle.mp4 -m /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml -at yolo --labels /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/coco-80cl.txt [ INFO ] OpenVINO Runtime [ INFO ] build: 2022.1.0-7019-cdb9bec7210-releases/2022/1 [ INFO ] Reading model /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml [ WARNING ] The parameter "input_size" not found in YOLO wrapper, will be omitted [ WARNING ] The parameter "num_classes" not found in YOLO wrapper, will be omitted [ INFO ] Input layer: image_input, shape: [1, 416, 416, 3], precision: f32, layout: NHWC [ INFO ] Output layer: conv2d_12/Conv2D/YoloRegion, shape: [1, 255, 26, 26], precision: f32, layout: [ INFO ] Output layer: conv2d_9/Conv2D/YoloRegion, shape: [1, 255, 13, 13], precision: f32, layout: [ INFO ] The model /home/intel/workspace/pipeline-server/models/object_detection/yolo-v3-tiny-tf/FP32-INT8/yolo-v3-tiny-tf.xml is loaded to CPU [ INFO ] Device: CPU [ INFO ] Number of streams: 4 [ INFO ] Number of threads: AUTO [ INFO ] Number of model infer requests: 5 [ INFO ] Metrics report: [ INFO ] Latency: 84.1 ms [ INFO ] FPS: 30.5 [ INFO ] Decoding: 0.4 ms [ INFO ] Preprocessing: 0.6 ms [ INFO ] Inference: 81.0 ms [ INFO ] Postprocessing: 1.9 ms [ INFO ] Rendering: 0.2 ms

brmarkus commented 2 years ago

Using the default-algorithm for INT8 quantization (instead of using the "accuracy-aware" algorithm) can come with a slightly reduced accuracy...

The sample "object_detection_demo.py" has a default-value '-t', '--prob_threshold', default=0.5, which you might need to increase to filter-out those with lower confidence-level.

gvadetect has the same default-value:

  threshold           : Threshold for detection results. Only regions of interest with confidence values above the threshold will be added to the frame
                        flags: readable, writable
                        Float. Range: 0 - 1 Default: 0.5

AishaSamaanKhan commented 2 years ago

Hi, I tried using the Accuracy Aware Quantization to create the Quantized model, And Followed the steps to integrate the this model into the pipeline server i still see same behavior for the inference below is the config for pot quantization

{
    "model": {
        "model_name": "yolo-v3-tiny-tf",
        "model": "yolo_v3_tiny/FP32/yolo-v3-tiny-tf.xml",
        "weights": "yolo_v3_tiny/FP32/yolo-v3-tiny-tf.bin"
    },
    "compression": {
        "dump_intermediate_model": true,
        "target_device": "ANY",
        "algorithms": [
            {
                "name": "AccuracyAwareQuantization",
                "params": {
                    "num_samples_for_tuning": 2000,
                    "preset": "performance",
                    "stat_subset_size": 300,
                    "use_layerwise_tuning": false
                }
            }
        ]
    },
    "engine": {
        "launchers": [
            {
                "framework": "openvino",
                "adapter": {
                    "type": "yolo_v3",
                    "outputs": [
                        "conv2d_9/Conv2D/YoloRegion",
                        "conv2d_12/Conv2D/YoloRegion"
                    ]
                },
                "device": "CPU"
            }
        ],
        "datasets": [
            {
                "name": "ms_coco_detection_80_class_without_background",
                "data_source": "val2017/val2017/",
                "annotation_conversion": {
                    "converter": "mscoco_detection",
                    "annotation_file": "./annotations/instances_val2017.json",
                    "has_background": false,
                    "sort_annotations": true,
                    "use_full_label_map": false
                },
                "annotation": "./dir_conv/mscoco_det_80.pickle",
                "dataset_meta": "./dir_conv/mscoco_det_80.json",
                "preprocessing": [
                    {
                        "type": "resize",
                        "size": 416

                    }
                ],
                "postprocessing": [
                    {
                        "type": "resize_prediction_boxes"
                    },
                    {
                        "type": "clip_boxes",
                        "apply_to": "prediction"
                    }
                ],
                "metrics": [
                    {
                            "type": "coco_precision",
                            "reference": 0.244
                    }
                ]
            }
        ]
    }
}

I am able to use it with the omz object detection sample, However it seems to fail on the pipeline server, could you please suggest any solution for this issue

brmarkus commented 2 years ago

Can you provide more details about the problem you are seeing, please? After INT8-quantization no objects were detected at all anymore, compared to the "original models" (F16, FP32)?

AishaSamaanKhan commented 2 years ago

yes, @brmarkus
With the downloaded models from the open model zoo (i.e.,The Original models), the pipeline server is able to give the inference, but when it comes the INT-8 Quantized model based on Accuracy Aware Algo the inference is failing with the pipeline server These same Quantized model were validated with the object detection sample from the OMZ repo.

dlstreamer / pipeline-server