NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
574 stars 43 forks source link

conver to trt error #64

Open steven-spec opened 2 months ago

steven-spec commented 2 months ago

I use modelopt QAT my model:

import modelopt.torch.quantization as mtq

# Select quantization config
config = mtq.INT8_DEFAULT_CFG

# Define forward loop for calibration
def forward_loop(model):
    for data in calib_set:
        model(data)

# QAT after replacement of regular modules to quantized modules
model = mtq.quantize(model, config, forward_loop)

# Fine-tune with original training pipeline
# Adjust learning rate and training duration
train(model, train_loader, optimizer, scheduler, ...)

then export to onnx , use onnx predict correct but when i convert to tensorrt use the code below:

def build_engine(onnx_file_path, engine_file_path):
    TRT_LOGGER = trt.Logger(trt.Logger.INFO)    
    with trt.Builder(TRT_LOGGER) as builder, \
         builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network, \
         trt.OnnxParser(network, TRT_LOGGER) as parser:        
        config = builder.create_builder_config()
        config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE,1 << 30)
        config.set_flag(trt.BuilderFlag.INT8)

        with open(onnx_file_path, 'rb') as model:
            if not parser.parse(model.read()):
                print('Failed to parse the ONNX file')
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                return None

        # Build and serialize the engine
        engine = builder.build_serialized_network(network, config)
        with open(engine_file_path, "wb") as f:
            f.write(engine)
        return engine

report error TypeError: a bytes-like object is required, not 'NoneType'

riyadshairi979 commented 2 months ago

Looks like the engine build failed. Please attach the generated ONNX model if possible or engine build log.

steven-spec commented 2 months ago

Looks like the engine build failed. Please attach the generated ONNX model if possible or engine build log. https://drive.google.com/file/d/1llHQdSq56TSBvFx79akudv8U21msCap_/view?usp=drive_link

riyadshairi979 commented 2 months ago

https://drive.google.com/file/d/1llHQdSq56TSBvFx79akudv8U21msCap_/view?usp=drive_link

Sent access request.