Open jinhonglu opened 3 months ago
What is the diff when you use all fp32 ?
@lix19937
What is the diff when you use all fp32 ?
It is quite strange that the result of the model (onnxfp16->tensortfp32) is also totally different from the onnx fp16.
What is wrong with my build engine code? I have commented the set layer precision at all during building.
` def build_engine():
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
TRT_BUILDER = trt.Builder(TRT_LOGGER)
for precision in BUILD:
engine_filename = '_'.join([MODEL_NAME, gpu_name, precision]) + '.engine'
if os.path.exists(engine_filename):
print(f'Engine file {engine_filename} exists. Skip building...')
continue
print(f'Building {precision} engine of {MODEL_NAME} model on {gpu_name} GPU...')
## parse ONNX model
network_creation_flag = 0
if "EXPLICIT_BATCH" in trt.NetworkDefinitionCreationFlag.__members__.keys():
network_creation_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
network = TRT_BUILDER.create_network(network_creation_flag)
onnx_parser = trt.OnnxParser(network, TRT_LOGGER)
parse_success = onnx_parser.parse_from_file(ONNX_MODEL)
for idx in range(onnx_parser.num_errors):
print(onnx_parser.get_error(idx))
if not parse_success:
sys.exit('ONNX model parsing failed')
## build TRT engine (configuration options at: https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#ibuilderconfig)
config = TRT_BUILDER.create_builder_config()
# seq_len = network.get_input(0).shape[1]
# handle dynamic shape (min/opt/max): https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes
# by default batch dim set as 1 for all min/opt/max. If there are batch need, change the value for opt and max accordingly
profile = TRT_BUILDER.create_optimization_profile()
profile.set_shape("input_ids", (1, 2, 1025, 690, 2), (1, 2, 1025, 690, 2), (1, 2, 1025, 690, 2))
profile.set_shape("output", (1, 1, 2050, 690, 2), (1, 1, 2050, 690, 2), (1, 1, 2050, 690, 2))
config.add_optimization_profile(profile)
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 4096 * (1 << 20)) # 4096 MiB
# precision
if precision == 'fp32':
config.clear_flag(trt.BuilderFlag.TF32) # TF32 enabled by default, need to clear flag
elif precision == 'tf32':
pass
elif precision == 'fp16':
config.set_flag(trt.BuilderFlag.FP16)
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
# for i in range(network.num_layers):
# op_name = network.get_layer(i).name.split('/')[-1]
# if 'Pow' == op_name or 'ReduceSum' == op_name or 'Pow_1' == op_name:
# print(network.get_layer(i).name)
# # input('test')
# network.get_layer(i).precision = trt.DataType.FLOAT
# network.get_layer(i).set_output_type(0, trt.DataType.FLOAT)
# if 'Pow_1_output_cast0' == op_name or 'ReduceSum_input_cast1' == op_name or 'Pow_output_cast0' == op_name\
# or 'Pow_1_input_cast0' == op_name or 'ReduceSum_input_cast0' == op_name or 'Pow_input_cast0' == op_name:
# print(network.get_layer(i).name)
# network.get_layer(i).precision = trt.DataType.FLOAT
# build
serialized_engine = TRT_BUILDER.build_serialized_network(network, config)
## save TRT engine
with open(engine_filename, 'wb') as f:
f.write(serialized_engine)
print(f'Engine is saved to {engine_filename}')
`
You can use follow to compare the diff between trt and ort.
polygraphy run your_onnx_name.onnx --trt --onnxrt
BTW, if you use trtexec, you can upload the full log with follow cmd
trtexec --verbose --onnx=your_onnx_name.onnx 2>&1 |tee build.log
trtexec --verbose --onnx=your_onnx_name.onnx --fp16 2>&1 |tee build_fp16.log
@lix19937
You can use follow to compare the diff between trt and ort.
I have run both fp32 and fp16
polygraphy run fp16.onnx --trt --onnxrt (--fp16) --execution-providers=cuda
fp16 is here
fp32 is here
Description
I tried to convert a mixed precision onnx model to mixed precision TensorRT engine.
In my mixed precision onnx model, I have kept some ops (ReduceSum, Pow) to be fp32, some back-to-back Cast Op to be fp32(For example, ReduceSum(fp32)->output(fp32)->Cast(fp32)->Pow(fp32))
In my build_engine.py, I set the following config, set obey precision and set the corresponding layers to be fp32
`
config.set_flag(trt.BuilderFlag.FP16) config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
`
The result of the tensor engine is quite different from the onnx model.
Any idea that I can solve this?
Environment
TensorRT Version:
NVIDIA GPU: A100
NVIDIA Driver Version: 12.5
CUDA Version:12.5
CUDNN Version: 12.5
Operating System: Linux
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):