Open minhhotboy9x opened 6 months ago
You can upload the onnx.
I see the description
"Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
and"Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
. I don't know what this means because my channel is not divisible by 8"Dimensions": [1,25,160,160]
, and is my model optimized?
It's vectorized format and trt will pad the tensor to the target format. you can refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#data-format-desc
@zerollzeng Oh I see. However, the data format of each layer is auto-chosen for the best performance, right? Since I convert on my Jetson Nano the layers is converted to datatype "Two wide channel vectorized row major FP16 format" CHW2
.
@lix19937 Here is my onnx v8s_pruned. This onnx is exported from Ultralytics so it has meta data. So, I use below python script to convert:
import argparse
import os
import json
import tensorrt as trt
from datetime import datetime
import onnx
import calibration
TRT_LOGGER = trt.Logger()
def parse_args():
parser = argparse.ArgumentParser(description='Convert ONNX models to TensorRT')
# Sample image
parser.add_argument('--batch_size', type=int, help='data batch size',
default=1)
parser.add_argument('--img_size', help='input size',
default=[3, 640, 640])
# Model path
parser.add_argument('--onnx_model_path', help='onnx model path',
default='./onnx_model.onnx')
parser.add_argument('--tensorrt_engine_path', help='tensorrt engine path',
default='./yolov5s_640_384_pfg_dynamic_max_batchsize_8_FP16.engine')
# TensorRT engine params
parser.add_argument('--dynamic_axes', help='dynamic batch input or output',
default='True')
parser.add_argument('--engine_precision', help='precision of TensorRT engine', choices=['FP32', 'FP16', 'INT8'],
default='FP16')
parser.add_argument('--min_engine_batch_size', type=int, help='set the min input data size of model for inference',
default=1)
parser.add_argument('--opt_engine_batch_size', type=int, help='set the most used input data size of model for inference',
default=1)
parser.add_argument('--max_engine_batch_size', type=int, help='set the max input data size of model for inference',
default=1)
parser.add_argument('--engine_workspace', type=int, help='workspace of engine',
default=4)
# Optional argument for INT8 precision
parser.add_argument('--data_calib', type=str, help='img data directory for int8 calibration', default='datasets/VOC/images/val2007')
args = string_to_bool(parser.parse_args())
if args.engine_precision == 'INT8' and args.data_calib is None:
parser.error("--data_calib is required when --engine_precision is set to INT8")
return args
def extract_metadata(onnx_model_path):
# Load ONNX model
model_onnx = onnx.load(onnx_model_path)
# Extract metadata
metadata = {}
for prop in model_onnx.metadata_props:
metadata[prop.key] = prop.value
return metadata
def string_to_bool(args):
if args.dynamic_axes.lower() in ('true'): args.dynamic_axes = True
else: args.dynamic_axes = False
return args
def build_engine(onnx_model_path, tensorrt_engine_path, engine_precision, dynamic_axes, \
img_size, batch_size, min_engine_batch_size, opt_engine_batch_size, max_engine_batch_size,\
engine_workspace, data_calib):
metadata = extract_metadata(onnx_model_path)
print(metadata)
# Builder
logger = trt.Logger(trt.Logger.ERROR)
builder = trt.Builder(logger)
network_flags = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
if engine_precision == "INT8":
print('PTQ enabled!')
network_flags = network_flags | (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION))
network = builder.create_network(network_flags)
profile = builder.create_optimization_profile()
config = builder.create_builder_config()
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
# Set FP16
if engine_precision == 'FP16':
config.set_flag(trt.BuilderFlag.FP16)
elif engine_precision == 'INT8':
config.flags |= 1 << int(trt.BuilderFlag.INT8)
config.flags |= 1 << int(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
calib_loader = calibration.DataLoader(batch_size, 128, data_calib, 640, 640)
config.int8_calibrator = calibration.Calibrator(calib_loader, data_calib + '.cache')
# Onnx parser
parser = trt.OnnxParser(network, logger)
if not os.path.exists(onnx_model_path):
print("Failed finding ONNX file!")
exit()
print("Succeeded finding ONNX file!")
with open(onnx_model_path, "rb") as model:
if not parser.parse(model.read()):
print("Failed parsing .onnx file!")
for error in range(parser.num_errors):
print(parser.get_error(error))
exit()
print("Succeeded parsing .onnx file!")
# Input
inputTensor = network.get_input(0)
# Dynamic batch (min, opt, max)
print('inputTensor.name:', inputTensor.name)
if dynamic_axes:
profile.set_shape(inputTensor.name, (min_engine_batch_size, img_size[0], img_size[1], img_size[2]), \
(opt_engine_batch_size, img_size[0], img_size[1], img_size[2]), \
(max_engine_batch_size, img_size[0], img_size[1], img_size[2]))
print('Set dynamic')
else:
profile.set_shape(inputTensor.name, (batch_size, img_size[0], img_size[1], img_size[2]), \
(batch_size, img_size[0], img_size[1], img_size[2]), \
(batch_size, img_size[0], img_size[1], img_size[2]))
config.add_optimization_profile(profile)
#network.unmark_output(network.get_output(0))
# Write engine
engineString = builder.build_serialized_network(network, config)
if engineString == None:
print("Failed building engine!")
exit()
print("Succeeded building engine!")
# Chuyển từ dictionary sang JSON và encode nó
metaString = json.dumps(metadata).encode('utf-8')
# Lưu engine cùng với metadata vào file
with open(tensorrt_engine_path, "wb") as f:
# Ghi độ dài của metadata
f.write(len(metaString).to_bytes(4, byteorder='little'))
# Ghi metadata
f.write(metaString)
# Ghi engine
f.write(engineString)
def main():
args = parse_args()
# Build TensorRT engine
build_engine(args.onnx_model_path, args.tensorrt_engine_path, args.engine_precision, args.dynamic_axes, \
args.img_size, args.batch_size, args.min_engine_batch_size, args.opt_engine_batch_size, \
args.max_engine_batch_size, args.engine_workspace, args.data_calib)
if __name__ == '__main__':
main()
Description
Hi, I'm newer to TensorRT and I'm trying to understand the layer performance. I read the doc Optimizing for Tensor Cores and see that with the FP16 precision, the dim of tensor should be multiples of 8 or 16. So I converted an ONNX model to an engine model, then I printed the layer information. Here is a part of it:
I see the description
"Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
and"Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
. I don't know what this means because my channel is not divisible by 8"Dimensions": [1,25,160,160]
, and is my model optimized?Sorry for my bad English.
Environment
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):