Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
Apache License 2.0
1.42k stars 620 forks source link

Register Aten Custom Ops #1208

Open dungng27 opened 1 year ago

dungng27 commented 1 year ago

Hi, I'm trying to quantize and compile a PyTorch model with some Aten operations not supported by Vitis-AI yet. Specifically, I'm deploying the model (quantizing the model with test mode) but some errors occured:

[VAIQ_NOTE]: Quant config file is empty, use default quant configuration

[VAIQ_NOTE]: Quantization test process start up...

[VAIQ_NOTE]: =>Quant Module is in 'cpu'.

[VAIQ_NOTE]: =>Parsing RotatedSegmentDetector...

[VAIQ_NOTE]: Start to trace model...
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/torch/nn/functional.py:2359: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))

[VAIQ_NOTE]: Finish tracing.

[VAIQ_NOTE]: Processing ops...
██████████████████████████████████████████████████| 403/403 [00:00<00:00, 2273.19it/s, OpInfo: name = return_0, type = Return]                           

[VAIQ_WARN]: The quantizer recognize new op `aten::mul_` as a float operator by default.

[VAIQ_WARN]: The quantizer recognize new op `aten::group_norm` as a float operator by default.

[VAIQ_WARN]: The quantizer recognize new op `aten::clamp_` as a float operator by default.

[VAIQ_WARN]: The quantizer recognize new op `clamp` as a float operator by default.

[VAIQ_NOTE]: =>Doing weights equalization...
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/nndct_shared/optimization/commander.py:454: RuntimeWarning: divide by zero encountered in true_divide
  scale = np.where(sqrt_of_ranges != 0, range_0 / sqrt_of_ranges, scale)
/opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.7/site-packages/nndct_shared/optimization/commander.py:454: RuntimeWarning: invalid value encountered in true_divide
  scale = np.where(sqrt_of_ranges != 0, range_0 / sqrt_of_ranges, scale)

[VAIQ_NOTE]: =>Quantizable module is generated.(/workspaces/PheNet_Vitis-AI/quant_model/RotatedSegmentDetector.py)

[VAIQ_NOTE]: =>Get module with quantization.

[VAIQ_NOTE]: =>Converting to xmodel ...
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0328 02:17:35.101768 21213 tool_function.cpp:171] [UNILOG][WARNING] The operator named RotatedSegmentDetector__RotatedSegmentDetector_MobileNetV3_backbone__InvertedResidual_layer1__SELayer_se__ConvModule_conv2__HSigmoid_activate__20467, type: aten::clamp_, is not defined in XIR. XIR creates the definition of this operator automatically. You should specify the shape and the data_type of the output tensor of this operation by set_attr("shape", std::vector<int>) and set_attr("data_type", std::string)
W0328 02:17:35.176827 21213 tool_function.cpp:171] [UNILOG][WARNING] The operator named RotatedSegmentDetector__RotatedSegmentDetector_RotatedFCOSHead_bbox_head__ConvModule_cls_convs__ModuleList_0__GroupNorm_gn__input_309, type: aten::group_norm, is not defined in XIR. XIR creates the definition of this operator automatically. You should specify the shape and the data_type of the output tensor of this operation by set_attr("shape", std::vector<int>) and set_attr("data_type", std::string)
W0328 02:17:35.193320 21213 tool_function.cpp:171] [UNILOG][WARNING] The operator named RotatedSegmentDetector__RotatedSegmentDetector_RotatedFCOSHead_bbox_head__22828, type: cast, is not defined in XIR. XIR creates the definition of this operator automatically. You should specify the shape and the data_type of the output tensor of this operation by set_attr("shape", std::vector<int>) and set_attr("data_type", std::string)
W0328 02:17:35.193936 21213 tool_function.cpp:171] [UNILOG][WARNING] The operator named RotatedSegmentDetector__RotatedSegmentDetector_RotatedFCOSHead_bbox_head__22831, type: clamp, is not defined in XIR. XIR creates the definition of this operator automatically. You should specify the shape and the data_type of the output tensor of this operation by set_attr("shape", std::vector<int>) and set_attr("data_type", std::string)
F0328 02:17:35.193950 21213 wrapper.cpp:111] [UNILOG][FATAL][PYXIR_INVALID_DATA_TYPE][] Unsupported data type!
*** Check failure stack trace: ***
Aborted (core dumped)

Here is the script i run to quantize and deploy the model:

from mmcv import collect_env

# Check MMRotate installation
import mmrotate

# Check MMDetection installation
import mmdet

# Check mmcv installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version

# from utils.postprocess import *
import glob
import os.path as osp
import shutil
import os
from tqdm.notebook import tqdm

import mmcv
from mmcv.runner import load_checkpoint

from mmdet.apis import inference_detector, show_result_pyplot
from mmrotate.models import build_detector
import torch
import numpy as np

from pytorch_nndct.apis import torch_quantizer, dump_xmodel
from icecream import ic
import sys
import argparse
import cv2
from utils.preprocess import preprocess
from utils.data import load_data
import glob
from tqdm import tqdm

from pytorch_nndct.apis import Inspector

def parse_args():
    parser = argparse.ArgumentParser(description='Testing config for the Implementation')
    parser.add_argument('-q',  '--quant_mode', type=str, default='test',    
                        choices=['calib','test'], help='Quantization mode (calib or test). Default is calib')
    parser.add_argument('-cf',  '--model_config', type=str, default='')
    parser.add_argument('-c',  '--checkpoint', type=str, default='')
    parser.add_argument('-o',  '--out_model_dir', type=str, default='')
    parser.add_argument('-n',  '--num_data', type=int, default=100)
    args = parser.parse_args()
    return args

def calculate_size(model):
    size = 0
    sl = 0
    for name , param in model.named_parameters():
        ic(name, param.dtype)
        size += sys.getsizeof(param.storage())/1024**2
        sl += param.numel()

    print(f"model size : {size:.3f} MB")    
    print(f"sl param : {sl} ")

if __name__ == '__main__':
    args = parse_args()
    output_dir = args.out_model_dir
    mode = args.quant_mode
    num_data = args.num_data

    ### Loading model
    # Choose to use a config and initialize the detector
    config = args.model_config

    # Setup a checkpoint file to load
    checkpoint = args.checkpoint

    # Set the device to be used for evaluation
    # device='cuda:0'

    # Load the config
    config = mmcv.Config.fromfile(config)
    # Set pretrained to be None since we do not need pretrained model here
    config.model.pretrained = None

    # Initialize the detector
    model = build_detector(config.model)

    # Load checkpoint
    checkpoint = load_checkpoint(model, checkpoint, map_location=device)

    # Set the classes of models for inference
    model.CLASSES = checkpoint['meta']['CLASSES']

    # We need to set the model's cfg for inference
    model.cfg = config

    # Convert the model to GPU
    # Convert the model into evaluation mode

    # Specify a target name or fingerprint you want to deploy on
    # target = "DPUCAHX8L_ISA0_SP"
    # # Initialize inspector with target
    # inspector = Inspector(target)

    ### Export model
    model.forward = model.forward_dummy
    images = glob.glob \
    dummy_input = torch.randn([1, 3, 256, 256])

    # inspector.inspect(model, (dummy_input,), device=torch.device(device), output_dir="inspect", image_format="png") 

    with torch.no_grad():   
        quantizer = torch_quantizer(mode, model, (dummy_input), output_dir=output_dir, device=torch.device("cpu"))  

    quantized_model = quantizer.quant_model

    # val_loader, _ = load_data(
    #     images,
    #     subset_len=num_data,
    #     batch_size=64,
    #     sample_method='random',
    # )

    # for iteraction, images in tqdm(
    #   enumerate(val_loader), total=len(val_loader)):
    #     output = quantized_model(images)

    output = quantized_model(dummy_input)

    # for path in tqdm(images):
    #     input_tensor = file2tensor(path)
    #     output = quantized_model(input_tensor)

    if mode == 'calib':
        quantizer.export_xmodel(deploy_check=False, output_dir=output_dir)

with the command:

python test_code/inference_pth.py -q test \
-cf /workspaces/PheNet_Vitis-AI/models/multihead/rotated_fcos_kld_mbnv3_fpn_1x_dota_le90.py \
-c /workspaces/PheNet_Vitis-AI/models/multihead/multihead.pth \
-o /workspaces/PheNet_Vitis-AI/quant_model

I'm using Vitis-AI 2.5 for stability. I read the docs about Register Custom Operation but I don't know how to apply this workflow to register these custom Aten ops. Could someone show me how? Many thanks.

dungng27 commented 1 year ago

It turns out that I can modify these ops by using torch functions rather than tensor's functions and register these. The problem is solved!

dungng27 commented 1 year ago

It turns out that I can modify these ops by using torch functions rather than tensor's functions and register these. The problem is solved!

This is just a temporary fix and not applicable to other ops. I wonder if there is a better work around.

manudwd commented 9 months ago


It turns out that I can modify these ops by using torch functions rather than tensor's functions and register these. The problem is solved!

Can you elaborate a little more on this with a dummy example? I'd really appreciate it.

shaantamchawla commented 8 months ago

I'm also stumped on this, with the specific operators:

acg93-pixel commented 8 months ago

Did you manage to add validation for quantized model from MMEngine? Can you share please?