NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.18k stars 2.08k forks source link

How to make PTQ calibration for a Hybrid Quantization model (int8 & fp16) #3978

Open renshujiajia opened 4 days ago

renshujiajia commented 4 days ago

Description

what is the right way to calibrate a hybrid quantization model ? i built my tensorrt engine from ONNX model by the sub code, i selected the class Calibrator(trt.IInt8EntropyCalibrator2) to set the config.int8_calibrator

My hybrid-quantized super-resolution model's inference results are biased towards magenta. I have performed clipping operations; what could be the possible reason for this? Is there an issue with my calibration code? Or could it be due to a poor distribution of the calibration dataset? i am sure that my infer program is absolute right. image

def build_engine_onnx(model_file, engine_file_path, min_shape, opt_shape, max_shape, calibration_stream):
    logger = trt.Logger(trt.Logger.INFO)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)

    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 2 << 30)                             # 1GB,即1024MB
    config.set_flag(trt.BuilderFlag.FP16)
    config.set_flag(trt.BuilderFlag.INT8)

    # 启用强类型匹配
    # config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
    # print(dir(trt.BuilderFlag))

    # Add calibrator
    calibrator = Calibrator(calibration_stream, 'calibration.cache')
    config.int8_calibrator = calibrator

    with open(model_file, 'rb') as model:
        if not parser.parse(model.read()):
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            return None

    profile = builder.create_optimization_profile()
    input_name = network.get_input(0).name

    # 设置多种输入张量维度
    # profile.set_shape(input_name, min_shape, opt_shape, max_shape)

    # 固定输入张量维度
    network.get_input(0).shape = fixed_shape            # 直接采用固定shape输入进行
    config.add_optimization_profile(profile)

    print(f"Building TensorRT engine from file {model_file}...")
    # engine = builder.build_engine(network, config)
    plan = builder.build_serialized_network(network, config)
    # if plan is None:
    #     raise RuntimeError("Failed to build the TensorRT engine!")

    # engine = runtime.deserialize_cuda_engine(plan)
    # print("Completed creating Engine")
    with open(engine_file_path, "wb") as f:
        f.write(bytearray(plan))
    return plan

Environment

TensorRT Version: 10.0.1

NVIDIA GPU: RTX4090

NVIDIA Driver Version: 12.0

CUDA Version: 12.0

CUDNN Version: 8.2.0

Operating System: Operating System: Linux interactive11554 5.11.0-27-generic https://github.com/NVIDIA/TensorRT/issues/29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Python Version (if applicable): 3,8,19

lix19937 commented 3 days ago

Try to add

profile.set_shape(input_name, opt_shape, opt_shape, opt_shape) # for fixed shape  

before config.add_optimization_profile(profile)

And check your preprocess code, or try minmax calibrator.

renshujiajia commented 3 days ago

Try to add

profile.set_shape(input_name, opt_shape, opt_shape, opt_shape) # for fixed shape  

before config.add_optimization_profile(profile)

And check your preprocess code, or try minmax calibrator.

thanks alot, i will try the minmax calibrator, but isn't network.get_input(0).shape = opt_shape and profile.set_shape(input_name, opt_shape, opt_shape, opt_shape) # for fixed shape serve the same purpose? the exported model information is as follows:

 input id:  0    istis input:  True      binding name:  input    shape:  (1, 3, 4320, 7680)      type:  DataType.FLOAT
 input id:  1    istis input:  False     binding name:  output   shape:  (1, 3, 8640, 15360)     type:  DataType.FLOAT 
lix19937 commented 2 days ago

If not profile.set_shape , your profile is empty. In fact, for fixed shape model, need not care optimization_profile.

network.get_input(0).shape = opt_shape
and
profile.set_shape(input_name, opt_shape, opt_shape, opt_shape)
are diff roles.