Quantified with the Yolov5 model, the MAP@0.5 is high(around 0.47), but the detection results are outrageous and unexpected

These days I have tried to do some quantification with yolov5_nano by using coco and coco128 datasets. The map@0.5 and map@0.95 after calib and test are both normal and satisfying. However, the picture results of these two tests are not satisfactory. The following is a more detailed description or code in the steps.

calib

I have run the below command to make calibration on yolov5_nano model by using coco datasets. python val.py --data ./data/coco.yaml --weights ./float/yolov5n_float.pt --batch-size 16 --imgsz 640 --conf-thres 0.5 --iou-thres 0.65 --quant_mode calib --nndct_quant

The result printed in the terminal is:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

[VAIQ_NOTE]: Loading NNDCT kernels...
val: data=./data/coco.yaml, weights=['./float/yolov5n_float.pt'], batch_size=16, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=False, quant_mode=calib, nndct_quant=True, dump_xmodel=False, nndct_stat=0
YOLOv5 🚀 v3.5-13-gf74ddc6ed torch 1.12.1 CPU

                 from  n    params  module                                  arguments                     
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
  2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  4                -1  2     29184  models.common.C3                        [64, 64, 2]                   
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  6                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 24      [17, 20, 23]  1    115005  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484747659/work/aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 298 layers, 1872157 parameters, 1872157 gradients, 4.6 GFLOPs

val: Scanning '../datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|█████████████████| 5000/5000 [00:00<?, ?it/s]
NNDCT quant dir: float/nndct_quant

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU

[VAIQ_NOTE]: OS and CPU information:
               system --- Linux
                 node --- ubuntu
              release --- 5.15.0-82-generic
              version --- #91~20.04.1-Ubuntu SMP Fri Aug 18 16:24:39 UTC 2023
              machine --- x86_64
            processor --- x86_64

[VAIQ_NOTE]: Tools version information:
                  GCC --- GCC 9.4.0
               python --- 3.7.12
              pytorch --- 1.12.1
        vai_q_pytorch --- 3.0.0+a44284e+torch1.12.1

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU.

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization calibration process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing Model...

[VAIQ_NOTE]: Start to trace and freeze model...

[VAIQ_NOTE]: The input model Model is torch.nn.Module.

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 205/205 [00:00<00:00, 1314.63it/s, OpInfo: name = return_0, type = Return]                                      

[VAIQ_NOTE]: =>Doing weights equalization...

[VAIQ_NOTE]: =>Quantizable module is generated.(float/nndct_quant/Model.py)

[VAIQ_NOTE]: =>Get module with quantization.
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|███████████████████████████████████| 313/313 [3:07:21<00:00, 35.92s/it]

[VAIQ_NOTE]: =>Exporting quant config.(float/nndct_quant/quant_info.json)
                 all       5000      36335      0.815      0.156      0.486      0.333
Speed: 2.6ms pre-process, 2241.2ms inference, 1.4ms NMS per image at shape (16, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp62/yolov5n_float_predictions.json...
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pkg_resources/__init__.py:119: PkgResourcesDeprecationWarning: 3.0.0-a44284e-torch1.12.1 is an invalid version and will not be supported in a future release
  PkgResourcesDeprecationWarning,
requirements: pycocotools not found and is required by YOLOv5, attempting auto-update...
requirements: 'pip install pycocotools' skipped (offline)
pycocotools unable to run: No module named 'pycocotools'
Results saved to runs/val/exp62

Judging by the results in the terminal, it is still relatively normal

test

I have also run the below command to make test on yolov5_nano model by using coco datasets. python val.py --data ./data/coco.yaml --weights ./float/yolov5n_float.pt --batch-size 1 --imgsz 640 --conf-thres 0.5 --iou-thres 0.65 --quant_mode test --nndct_quant

The result printed in the terminal is:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

[VAIQ_NOTE]: Loading NNDCT kernels...
val: data=./data/coco.yaml, weights=['./float/yolov5n_float.pt'], batch_size=1, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=False, quant_mode=test, nndct_quant=True, dump_xmodel=False, nndct_stat=0
YOLOv5 🚀 v3.5-13-gf74ddc6ed torch 1.12.1 CPU

                 from  n    params  module                                  arguments                     
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
  2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  4                -1  2     29184  models.common.C3                        [64, 64, 2]                   
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  6                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 24      [17, 20, 23]  1    115005  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484747659/work/aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 298 layers, 1872157 parameters, 1872157 gradients, 4.6 GFLOPs

val: Scanning '../datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|█████████████████| 5000/5000 [00:00<?, ?it/s]
NNDCT quant dir: float/nndct_quant

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU

[VAIQ_NOTE]: OS and CPU information:
               system --- Linux
                 node --- ubuntu
              release --- 5.15.0-82-generic
              version --- #91~20.04.1-Ubuntu SMP Fri Aug 18 16:24:39 UTC 2023
              machine --- x86_64
            processor --- x86_64

[VAIQ_NOTE]: Tools version information:
                  GCC --- GCC 9.4.0
               python --- 3.7.12
              pytorch --- 1.12.1
        vai_q_pytorch --- 3.0.0+a44284e+torch1.12.1

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU.

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization test process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing Model...

[VAIQ_NOTE]: Start to trace and freeze model...

[VAIQ_NOTE]: The input model Model is torch.nn.Module.

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 205/205 [00:00<00:00, 1068.12it/s, OpInfo: name = return_0, type = Return]                                      

[VAIQ_NOTE]: =>Doing weights equalization...

[VAIQ_NOTE]: =>Quantizable module is generated.(float/nndct_quant/Model.py)

[VAIQ_NOTE]: =>Get module with quantization.
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|███████████████████████████████████| 5000/5000 [45:36<00:00,  1.83it/s]
                 all       5000      36335      0.794      0.157      0.476      0.319
Speed: 2.2ms pre-process, 522.4ms inference, 1.2ms NMS per image at shape (1, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp65/yolov5n_float_predictions.json...

It is still normal.

unexpected results

However, as soon as I opened the folder that contains the images after running test and calib, I got shocked. For example, even a vase, which is very simple for detection in Figure 1 below, cannot be detected in the results of the test after quantification is completed. Even the most classic pictures of bears in the Coco dataset could not be detected, which made me doubt the correctness of the quantification results.

val_batch2_labels val_batch2_pred val_batch1_labels val_batch1_pred

So I'm wondering why the map@0.5 is high but the actual detection results are not very reasonable and appliable, is there any way to solve such problems?

Can you provide the val.py script and the vitis ai version from the docker image?

Can you provide the val.py script and the vitis ai version from the docker image?

The val.py script is given below:

python val.py --data ./data/coco.yaml --weights ./float/yolov5n_float.pt --batch-size 16 --imgsz 640 --conf-thres 0.5 --iou-thres 0.65 --quant_mode calib --nndct_quant 
python val.py --data ./data/coco.yaml --weights ./float/yolov5n_float.pt --batch-size 1 --imgsz 640 --conf-thres 0.5 --iou-thres 0.65 --quant_mode test --nndct_quant
python val.py --data ./data/coco.yaml --weights ./float/yolov5n_float.pt --batch-size 1 --imgsz 640 --conf-thres 0.5 --iou-thres 0.65 --quant_mode test --nndct_quant --dump_xmodel

And the vitis ai version from the docker image is: xilinx/vitis-ai-pytorch-cpu:ubuntu2004-3.0.0.106

It would be very nice of you to help me to solve this problem. If there is any question, do not hesitate to contact me.

The val.py as in the script the python file not the CLI

The val.py as in the script the python file not the CLI

o,sorry I misunderstand your question. The val.py file is given below,

# Copyright 2019 Xilinx Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
"""
Validate a trained YOLOv5 model accuracy on a custom dataset

Usage:
    $ python path/to/val.py --data coco128.yaml --weights yolov5s.pt --img 640
"""

import argparse
import json
import os
import sys
from pathlib import Path
from threading import Thread
from functools import partial

import numpy as np
import torch
from tqdm import tqdm

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

from models.experimental import attempt_load
from utils.datasets import create_dataloader
from utils.general import coco80_to_coco91_class, check_dataset, check_img_size, check_requirements, \
    check_suffix, check_yaml, box_iou, non_max_suppression, scale_coords, xyxy2xywh, xywh2xyxy, set_logging, \
    increment_path, colorstr, print_args
from utils.metrics import ap_per_class, ConfusionMatrix
from utils.plots import output_to_target, plot_images, plot_val_study
from utils.torch_utils import select_device, time_sync
from utils.callbacks import Callbacks

def save_one_txt(predn, save_conf, shape, file):
    # Save one txt result
    gn = torch.tensor(shape)[[1, 0, 1, 0]]  # normalization gain whwh
    for *xyxy, conf, cls in predn.tolist():
        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
        with open(file, 'a') as f:
            f.write(('%g ' * len(line)).rstrip() % line + '\n')

def save_one_json(predn, jdict, path, class_map):
    # Save one JSON result {"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}
    image_id = int(path.stem) if path.stem.isnumeric() else path.stem
    box = xyxy2xywh(predn[:, :4])  # xywh
    box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
    for p, b in zip(predn.tolist(), box.tolist()):
        jdict.append({'image_id': image_id,
                      'category_id': class_map[int(p[5])],
                      'bbox': [round(x, 3) for x in b],
                      'score': round(p[4], 5)})

def process_batch(detections, labels, iouv):
    """
    Return correct predictions matrix. Both sets of boxes are in (x1, y1, x2, y2) format.
    Arguments:
        detections (Array[N, 6]), x1, y1, x2, y2, conf, class
        labels (Array[M, 5]), class, x1, y1, x2, y2
    Returns:
        correct (Array[N, 10]), for 10 IoU levels
    """
    correct = torch.zeros(detections.shape[0], iouv.shape[0], dtype=torch.bool, device=iouv.device)
    iou = box_iou(labels[:, 1:], detections[:, :4])
    x = torch.where((iou >= iouv[0]) & (labels[:, 0:1] == detections[:, 5]))  # IoU above threshold and classes match
    if x[0].shape[0]:
        matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detection, iou]
        if x[0].shape[0] > 1:
            matches = matches[matches[:, 2].argsort()[::-1]]
            matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
            # matches = matches[matches[:, 2].argsort()[::-1]]
            matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
        matches = torch.Tensor(matches).to(iouv.device)
        correct[matches[:, 1].long()] = matches[:, 2:3] >= iouv
    return correct

def run(data,
        weights=None,  # model.pt path(s)
        batch_size=32,  # batch size
        imgsz=640,  # inference size (pixels)
        conf_thres=0.001,  # confidence threshold
        iou_thres=0.6,  # NMS IoU threshold
        task='val',  # train, val, test, speed or study
        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        single_cls=False,  # treat as single-class dataset
        augment=False,  # augmented inference
        verbose=False,  # verbose output
        save_txt=False,  # save results to *.txt
        save_hybrid=False,  # save label+prediction hybrid results to *.txt
        save_conf=False,  # save confidences in --save-txt labels
        save_json=False,  # save a COCO-JSON results file
        project=ROOT / 'runs/val',  # save to project/name
        name='exp',  # save to project/name
        exist_ok=False,  # existing project/name ok, do not increment
        half=True,  # use FP16 half-precision inference
        nndct_quant=False,
        nndct_bitwidth=8,
        model=None,
        dataloader=None,
        save_dir=Path(''),
        plots=True,
        callbacks=Callbacks(),
        compute_loss=None,
        quant_mode='calib',
        dump_xmodel=False,
        nndct_stat=0,
        ):
    # Initialize/load model and set device
    training = model is not None
    if nndct_quant:
        os.environ["W_QUANT"] = "1"
        assert half is False and augment is False, "Invalid settings for nndct quant"
        if dump_xmodel:
            assert quant_mode == 'test', "Quant model should be 'test' for dumping xmodel"
            assert batch_size == 1, "Dump xmodel only support batch size 1"
    if training and not nndct_quant:  # called by train.py
        device = next(model.parameters()).device  # get model device

    else:  # called directly
        device = select_device(device, batch_size=batch_size)

        # Directories
        save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
        (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

        # Load model
        if training:
            device = next(model.parameters()).device  # get model device
        else:
            check_suffix(weights, '.pt')
            model = attempt_load(weights, map_location=device, fuse=not nndct_quant, force_reexport_deployable_model=not training, imgsz=imgsz)  # load FP32 model
            # Data
            data = check_dataset(data)  # check
        gs = max(int(model.stride.max()), 32)  # grid size (max stride)
        imgsz = check_img_size(imgsz, s=gs)  # check image size

        # Multi-GPU disabled, incompatible with .half() https://github.com/ultralytics/yolov5/issues/99
        # if device.type != 'cpu' and torch.cuda.device_count() > 1:
        #     model = nn.DataParallel(model)

    # Half
    half &= device.type != 'cpu'  # half precision only supported on CUDA
    model.half() if half else model.float()

    # Configure
    model.eval()
    is_coco = isinstance(data.get('val'), str) and data['val'].endswith('coco/val2017.txt')  # COCO dataset
    nc = 1 if single_cls else int(data['nc'])  # number of classes
    iouv = torch.linspace(0.5, 0.95, 10).to(device)  # iou vector for mAP@0.5:0.95
    niou = iouv.numel()

    # Dataloader
    if not training:
        if device.type != 'cpu':
            model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once pad = 0.0 if task == 'speed' else 0.5
        pad = 0.0 if task == 'speed' else 0.5
        task = task if task in ('train', 'val', 'test') else 'val'  # path to train/val/test images
        dataloader = create_dataloader(data[task], imgsz, batch_size, gs, single_cls, pad=pad, rect=not nndct_quant,
                                       prefix=colorstr(f'{task}: '), workers=8)[0]

    seen = 0
    confusion_matrix = ConfusionMatrix(nc=nc)
    names = {k: v for k, v in enumerate(model.names if hasattr(model, 'names') else model.module.names)}
    class_map = coco80_to_coco91_class() if is_coco else list(range(1000))
    s = ('%20s' + '%11s' * 6) % ('Class', 'Images', 'Labels', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')
    dt, p, r, f1, mp, mr, map50, map = [0.0, 0.0, 0.0], 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
    loss = torch.zeros(3, device=device)
    jdict, stats, ap, ap_class = [], [], [], []
    if nndct_quant:
        from pytorch_nndct.apis import torch_quantizer
        import pytorch_nndct as py_nndct
        from nndct_shared.utils import NndctOption
        from nndct_shared.base import key_names, NNDCT_KEYS, NNDCT_DEBUG_LVL, GLOBAL_MAP, NNDCT_OP
        import nndct_shared.quantization as nndct_quant
        from pytorch_nndct.quantization import torchquantizer
        input_tensor = (torch.randn(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))
        model.forward = partial(model.forward, quant=True)
        if training:
            assert quant_mode == 'test'
            output_dir = weights
        else:
            w = Path(weights[0] if isinstance(weights, list) else weights)
            output_dir = w.parent / 'nndct_quant'
        print(f"NNDCT quant dir: {output_dir}")
        quantizer = torch_quantizer(quant_mode=quant_mode,
                                    bitwidth=nndct_bitwidth,
                                    module=model,
                                    input_args=input_tensor,
                                    output_dir=output_dir.as_posix()     )
        # if (NndctOption.nndct_stat.value > 2):
        #     def do_quantize(instance, blob, name, node=None, tensor_type='input'):
        #         # forward quant graph but not quantize parameter and activation
        #         if NndctOption.nndct_quant_off.value:
        #             return blob

        #         blob_save = None
        #         if isinstance(blob.values, torch.Tensor):
        #             blob_save = blob
        #             blob = blob.values.data

        #         quant_device = GLOBAL_MAP.get_ele(NNDCT_KEYS.QUANT_DEVICE)
        #         if blob.device.type != quant_device.type:
        #             raise TypeError(
        #                 "Device of quantizer is {}, device of model and data should match device of quantizer".format(
        #                     quant_device.type))

        #         if (NndctOption.nndct_stat.value > 2):
        #             quant_data = nndct_quant.QuantizeData(name, blob.cpu().detach().numpy())
        #         # quantize the tensor
        #         bnfp = instance.get_bnfp(name, True, tensor_type)
        #         if (NndctOption.nndct_stat.value > 1):
        #             print('---- quant %s tensor: %s with 1/step = %g' % (
        #                 tensor_type, name, bnfp[1]))
        #         # hardware cut method
        #         mth = 4 if instance.lstm else 2
        #         if tensor_type == 'param':
        #             mth = 3

        #         res = py_nndct.nn.NndctFixNeuron(blob,
        #                                             blob,
        #                                             maxamp=[bnfp[0], bnfp[1]],
        #                                             method=mth)

        #         if (NndctOption.nndct_stat.value > 2):
        #             quant_efficiency, sqnr = quant_data.quant_efficiency(blob.cpu().detach().numpy(), 8)
        #             torchquantizer.global_snr_inv += 1 / sqnr
        #             print(f"quant_efficiency={quant_efficiency}, global_snr_inv={torchquantizer.global_snr_inv} {quant_data._name}\n")

        #         # update param to nndct graph
        #         if tensor_type == 'param':
        #             instance.update_param_to_nndct(node, name, res.cpu().detach().numpy())

        #         if blob_save is not None:
        #             blob_save.values.data = blob
        #             blob = blob_save
        #             res = blob_savedataloader

        #         return res

        #     _quantizer = GLOBAL_MAP.get_ele(NNDCT_KEYS.QUANTIZER)
        #     _quantizer.do_quantize = do_quantize.__get__(_quantizer)
        quant_model = quantizer.quant_model
        ori_forward = quant_model.forward
        post_method = model.model[-1].post_process
        def forward(*args, **kwargs):
            out = ori_forward(*args, **kwargs)
            return post_method(out)
        quant_model.forward = forward

    if dump_xmodel:
        total = 1
    else:
        total = len(dataloader)
    for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s, total=total)):
        t1 = time_sync()
        img = img.to(device, non_blocking=True)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        targets = targets.to(device)
        nb, _, height, width = img.shape  # batch size, channels, height, width
        t2 = time_sync()
        dt[0] += t2 - t1

        with torch.no_grad():
            # Run model
            if nndct_quant:
                out = quant_model(img)
                out, train_out = out[0], out[1]
            else:
                out, train_out = model(img, augment=augment)  # inference and training outputs
        dt[1] += time_sync() - t2

        # Compute loss
        if compute_loss:
            loss += compute_loss([x.float() for x in train_out], targets)[1]  # box, obj, cls

        # Run NMS
        targets[:, 2:] *= torch.Tensor([width, height, width, height]).to(device)  # to pixels
        lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling
        t3 = time_sync()
        out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)
        dt[2] += time_sync() - t3

        # Statistics per image
        for si, pred in enumerate(out):
            labels = targets[targets[:, 0] == si, 1:]
            nl = len(labels)
            tcls = labels[:, 0].tolist() if nl else []  # target class
            path, shape = Path(paths[si]), shapes[si][0]
            seen += 1

            if len(pred) == 0:
                if nl:
                    stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
                continue

            # Predictions
            if single_cls:
                pred[:, 5] = 0
            predn = pred.clone()
            scale_coords(img[si].shape[1:], predn[:, :4], shape, shapes[si][1])  # native-space pred

            # Evaluate
            if nl:
                tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
                scale_coords(img[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels
                labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
                correct = process_batch(predn, labelsn, iouv)
                if plots:
                    confusion_matrix.process_batch(predn, labelsn)
            else:
                correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool)
            stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))  # (correct, conf, pcls, tcls)

            # Save/log
            if save_txt:
                save_one_txt(predn, save_conf, shape, file=save_dir / 'labels' / (path.stem + '.txt'))
            if save_json:
                save_one_json(predn, jdict, path, class_map)  # append to COCO-JSON dictionary
            callbacks.run('on_val_image_end', pred, predn, path, names, img[si])

        # Plot images
        if plots and batch_i < 3:
            f = save_dir / f'val_batch{batch_i}_labels.jpg'  # labels
            Thread(target=plot_images, args=(img, targets, paths, f, names), daemon=True).start()
            f = save_dir / f'val_batch{batch_i}_pred.jpg'  # predictions
            Thread(target=plot_images, args=(img, output_to_target(out), paths, f, names), daemon=True).start()

        if dump_xmodel:
            break

    if nndct_quant and quant_mode == 'calib':
        quantizer.export_quant_config()
    if dump_xmodel:
        quantizer.export_xmodel(output_dir=output_dir.as_posix(), deploy_check=False)
        return

    # Compute statistics
    stats = [np.concatenate(x, 0) for x in zip(*stats)]  # to numpy
    if len(stats) and stats[0].any():
        p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
        nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
    else:
        nt = torch.zeros(1)

    # Print results
    pf = '%20s' + '%11i' * 2 + '%11.3g' * 4  # print format
    print(pf % ('all', seen, nt.sum(), mp, mr, map50, map))

    # Print results per class
    if (verbose or (nc < 50 and not training)) and nc > 1 and len(stats):
        for i, c in enumerate(ap_class):
            print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))

    # Print speeds
    t = tuple(x / seen * 1E3 for x in dt)  # speeds per image
    if not training:
        shape = (batch_size, 3, imgsz, imgsz)
        print(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {shape}' % t)

    # Plots
    if plots:
        confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))
        callbacks.run('on_val_end')

    # Save JSON
    if save_json and len(jdict):
        w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else ''  # weights
        anno_json = str('/group/dphi_algo/coco/annotations/annotations_2017/instances_val2017.json')  # annotations json
        pred_json = str(save_dir / f"{w}_predictions.json")  # predictions json
        print(f'\nEvaluating pycocotools mAP... saving {pred_json}...')
        with open(pred_json, 'w') as f:
            json.dump(jdict, f)

        try:  # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
            check_requirements(['pycocotools'])
            from pycocotools.coco import COCO
            from pycocotools.cocoeval import COCOeval

            anno = COCO(anno_json)  # init annotations api
            pred = anno.loadRes(pred_json)  # init predictions api
            eval = COCOeval(anno, pred, 'bbox')
            if is_coco:
                eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.img_files]  # image IDs to evaluate
            eval.evaluate()
            eval.accumulate()
            eval.summarize()
            map, map50 = eval.stats[:2]  # update results (mAP@0.5:0.95, mAP@0.5)
        except Exception as e:
            print(f'pycocotools unable to run: {e}')

    # Return results
    model.float()  # for training
    if not training:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        print(f"Results saved to {colorstr('bold', save_dir)}{s}")
    maps = np.zeros(nc) + map
    for i, c in enumerate(ap_class):
        maps[c] = ap[i]
    return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model.pt path(s)')
    parser.add_argument('--batch-size', type=int, default=32, help='batch size')
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.001, help='confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.6, help='NMS IoU threshold')
    parser.add_argument('--task', default='val', help='train, val, test, speed or study')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--verbose', action='store_true', help='report mAP by class')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-hybrid', action='store_true', help='save label+prediction hybrid results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--save-json', action='store_true', help='save a COCO-JSON results file')
    parser.add_argument('--project', default=ROOT / 'runs/val', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')
    parser.add_argument('--quant_mode', default='calib', help='nndct quant mode')
    parser.add_argument('--nndct_quant', action='store_true', help='use nndct quant model for inference')
    parser.add_argument('--dump_xmodel', action='store_true', help='dump nndct xmodel')
    parser.add_argument('--nndct_stat', type=int, required=False, default=0)
    opt = parser.parse_args()
    opt.data = check_yaml(opt.data)  # check YAML
    opt.save_json |= opt.data.endswith('coco.yaml')
    opt.save_txt |= opt.save_hybrid
    print_args(FILE.stem, opt)
    return opt

def main(opt):
    set_logging()
    check_requirements(exclude=('tensorboard', 'thop'))

    if opt.task in ('train', 'val', 'test'):  # run normally
        run(**vars(opt))

    elif opt.task == 'speed':  # speed benchmarks
        # python val.py --task speed --data coco.yaml --batch 1 --weights yolov5n.pt yolov5s.pt...
        for w in opt.weights if isinstance(opt.weights, list) else [opt.weights]:
            run(opt.data, weights=w, batch_size=opt.batch_size, imgsz=opt.imgsz, conf_thres=.25, iou_thres=.45,
                device=opt.device, save_json=False, plots=False)

    elif opt.task == 'study':  # run over a range of settings and save/plot
        # python val.py --task study --data coco.yaml --iou 0.7 --weights yolov5n.pt yolov5s.pt...
        x = list(range(256, 1536 + 128, 128))  # x axis (image sizes)
        for w in opt.weights if isinstance(opt.weights, list) else [opt.weights]:
            f = f'study_{Path(opt.data).stem}_{Path(w).stem}.txt'  # filename to save to
            y = []  # y axis
            for i in x:  # img-size
                print(f'\nRunning {f} point {i}...')
                r, _, t = run(opt.data, weights=w, batch_size=opt.batch_size, imgsz=i, conf_thres=opt.conf_thres,
                              iou_thres=opt.iou_thres, device=opt.device, save_json=opt.save_json, plots=False)
                y.append(r + t)  # results and times
            np.savetxt(f, y, fmt='%10.4g')  # save
        os.system('zip -r study.zip study_*.txt')
        plot_val_study(x=x)  # plot

if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

You have to remove some operations from the model head like exclude certain nodes refer to this issue https://github.com/ultralytics/yolov5/issues/1288

Quantified with the Yolov5 model, the MAP@0.5 is high(around 0.47), but the detection results are outrageous and unexpected

calib

The result printed in the terminal is:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

[VAIQ_NOTE]: Loading NNDCT kernels...
val: data=./data/coco.yaml, weights=['./float/yolov5n_float.pt'], batch_size=16, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=False, quant_mode=calib, nndct_quant=True, dump_xmodel=False, nndct_stat=0
YOLOv5 🚀 v3.5-13-gf74ddc6ed torch 1.12.1 CPU

                 from  n    params  module                                  arguments                     
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
  2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  4                -1  2     29184  models.common.C3                        [64, 64, 2]                   
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  6                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 24      [17, 20, 23]  1    115005  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484747659/work/aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 298 layers, 1872157 parameters, 1872157 gradients, 4.6 GFLOPs

val: Scanning '../datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|█████████████████| 5000/5000 [00:00<?, ?it/s]
NNDCT quant dir: float/nndct_quant

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU

[VAIQ_NOTE]: OS and CPU information:
               system --- Linux
                 node --- ubuntu
              release --- 5.15.0-82-generic
              version --- #91~20.04.1-Ubuntu SMP Fri Aug 18 16:24:39 UTC 2023
              machine --- x86_64
            processor --- x86_64

[VAIQ_NOTE]: Tools version information:
                  GCC --- GCC 9.4.0
               python --- 3.7.12
              pytorch --- 1.12.1
        vai_q_pytorch --- 3.0.0+a44284e+torch1.12.1

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU.

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization calibration process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing Model...

[VAIQ_NOTE]: Start to trace and freeze model...

[VAIQ_NOTE]: The input model Model is torch.nn.Module.

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 205/205 [00:00<00:00, 1314.63it/s, OpInfo: name = return_0, type = Return]                                      

[VAIQ_NOTE]: =>Doing weights equalization...

[VAIQ_NOTE]: =>Quantizable module is generated.(float/nndct_quant/Model.py)

[VAIQ_NOTE]: =>Get module with quantization.
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|███████████████████████████████████| 313/313 [3:07:21<00:00, 35.92s/it]

[VAIQ_NOTE]: =>Exporting quant config.(float/nndct_quant/quant_info.json)
                 all       5000      36335      0.815      0.156      0.486      0.333
Speed: 2.6ms pre-process, 2241.2ms inference, 1.4ms NMS per image at shape (16, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp62/yolov5n_float_predictions.json...
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/pkg_resources/__init__.py:119: PkgResourcesDeprecationWarning: 3.0.0-a44284e-torch1.12.1 is an invalid version and will not be supported in a future release
  PkgResourcesDeprecationWarning,
requirements: pycocotools not found and is required by YOLOv5, attempting auto-update...
requirements: 'pip install pycocotools' skipped (offline)
pycocotools unable to run: No module named 'pycocotools'
Results saved to runs/val/exp62

Judging by the results in the terminal, it is still relatively normal

test

The result printed in the terminal is:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

[VAIQ_NOTE]: Loading NNDCT kernels...
val: data=./data/coco.yaml, weights=['./float/yolov5n_float.pt'], batch_size=1, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=False, quant_mode=test, nndct_quant=True, dump_xmodel=False, nndct_stat=0
YOLOv5 🚀 v3.5-13-gf74ddc6ed torch 1.12.1 CPU

                 from  n    params  module                                  arguments                     
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
  2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  4                -1  2     29184  models.common.C3                        [64, 64, 2]                   
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  6                -1  3    156928  models.common.C3                        [128, 128, 3]                 
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 24      [17, 20, 23]  1    115005  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484747659/work/aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 298 layers, 1872157 parameters, 1872157 gradients, 4.6 GFLOPs

val: Scanning '../datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|█████████████████| 5000/5000 [00:00<?, ?it/s]
NNDCT quant dir: float/nndct_quant

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU

[VAIQ_NOTE]: OS and CPU information:
               system --- Linux
                 node --- ubuntu
              release --- 5.15.0-82-generic
              version --- #91~20.04.1-Ubuntu SMP Fri Aug 18 16:24:39 UTC 2023
              machine --- x86_64
            processor --- x86_64

[VAIQ_NOTE]: Tools version information:
                  GCC --- GCC 9.4.0
               python --- 3.7.12
              pytorch --- 1.12.1
        vai_q_pytorch --- 3.0.0+a44284e+torch1.12.1

[VAIQ_WARN][QUANTIZER_TORCH_CUDA_UNAVAILABLE]: CUDA (HIP) is not available, change device to CPU.

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization test process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing Model...

[VAIQ_NOTE]: Start to trace and freeze model...

[VAIQ_NOTE]: The input model Model is torch.nn.Module.

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 205/205 [00:00<00:00, 1068.12it/s, OpInfo: name = return_0, type = Return]                                      

[VAIQ_NOTE]: =>Doing weights equalization...

[VAIQ_NOTE]: =>Quantizable module is generated.(float/nndct_quant/Model.py)

[VAIQ_NOTE]: =>Get module with quantization.
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|███████████████████████████████████| 5000/5000 [45:36<00:00,  1.83it/s]
                 all       5000      36335      0.794      0.157      0.476      0.319
Speed: 2.2ms pre-process, 522.4ms inference, 1.2ms NMS per image at shape (1, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp65/yolov5n_float_predictions.json...

It is still normal.

unexpected results

val_batch2_labels val_batch2_pred val_batch1_labels val_batch1_pred

So I'm wondering why the map@0.5 is high but the actual detection results are not very reasonable and appliable, is there any way to solve such problems?

Hello, can you please tell me what version of yolov5 are you using?

I am using yolov5-6.0 to do all the training and quantification.

I am using yolov5-6.0 to do all the training and quantification.

After deploying the model on the fpga, I noticed that the objects are detected but the boxes are very small. I think that the problem is with the prototxt file I'm using. Can you share with me the prototxt you are using?

Where can i get this val.py?

Hello, does anyone have any idea where i can get this val.py script. thanks.

Xilinx / Vitis-AI