Open peter5232 opened 8 months ago
Your onnx is invalid, it failed with onnxruntime
$ polygraphy run model.onnx --onnxrt
[I] RUNNING | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run model.onnx --onnxrt
[I] onnxrt-runner-N0-01/12/24-08:07:21 | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
Traceback (most recent call last):
File "/home/scratch.zeroz_sw/miniconda3/bin/polygraphy", line 8, in <module>
sys.exit(main())
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/tools/_main.py", line 70, in main
status = selected_tool.run(args)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/tools/base/tool.py", line 171, in run
status = self.run_impl(args)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/tools/run/run.py", line 228, in run_impl
exec(str(script))
File "<string>", line 21, in <module>
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/comparator/comparator.py", line 213, in run
run_results.append((runner.name, execute_runner(runner, loader_cache)))
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/comparator/comparator.py", line 98, in execute_runner
with runner as active_runner:
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/backend/base/runner.py", line 60, in __enter__
self.activate()
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/backend/base/runner.py", line 95, in activate
self.activate_impl()
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/util/util.py", line 694, in wrapped
return func(*args, **kwargs)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/backend/onnxrt/runner.py", line 44, in activate_impl
self.sess, _ = util.invoke_if_callable(self._sess)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/util/util.py", line 663, in invoke_if_callable
ret = func(*args, **kwargs)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/backend/base/loader.py", line 40, in __call__
return self.call_impl(*args, **kwargs)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/util/util.py", line 694, in wrapped
return func(*args, **kwargs)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/backend/onnxrt/loader.py", line 68, in call_impl
return onnxrt.InferenceSession(model_bytes, providers=providers)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 424, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/scratch.zeroz_sw/github_bug/3590/model.onnx failed:Node (/MaxPool) Op (MaxPool) [ShapeInferenceError] Attribute strides has incorrect size
I used the torch.export function to export onnx and use trtexec to generate engine.
trtexec --onnx=model.onnx --saveEngine=model.engine --verbose
onnx
engine
I used the polygraphy run model.onnx --onnxrt
command to test the onnx file and the test passed.
I also used the polygraphy run model.engine --trt
command to test the correctness of the engine file.
However, I used tensorRT python api to load the engine file and perform inference, but an error occurred.
The code for python inference is as follows.
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import time
import common
from cuda import cuda,cudart
class TensorRTInfer:
"""
Implements inference for the Model TensorRT engine.
"""
def __init__(self, engine_path):
"""
:param engine_path: The path to the serialized engine to load from disk.
"""
# Load TRT engine
self.logger = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(self.logger, namespace="")
with open(engine_path, "rb") as f, trt.Runtime(self.logger) as runtime:
assert runtime
self.engine = runtime.deserialize_cuda_engine(f.read())
assert self.engine
self.context = self.engine.create_execution_context()
assert self.context
# Setup I/O bindings
self.inputs = []
self.outputs = []
self.allocations = []
for i in range(self.engine.num_bindings):
is_input = False
name = self.engine.get_tensor_name(i)
# print(self.engine.get_tensor_mode(name))
if self.engine.binding_is_input(i):
is_input = True
name = self.engine.get_binding_name(i)
dtype = self.engine.get_binding_dtype(i)
shape = self.engine.get_binding_shape(i)
# if name == '427':
# shape = (1, 9382, 2)
# elif name == '430':
# shape = (1, 9382)
# elif name == "433":
# shape = (1, 256, 9382)
if is_input:
self.batch_size = shape[0]
size = np.dtype(trt.nptype(dtype)).itemsize
for s in shape:
size *= s
print(size,shape, self.engine.get_tensor_shape(name),name )
allocation = common.cuda_call(cudart.cudaMalloc(size))
binding = {
'index': i,
'name': name,
'dtype': np.dtype(trt.nptype(dtype)),
'shape': list(shape),
'allocation': allocation,
'size': size
}
self.allocations.append(allocation)
if self.engine.binding_is_input(i):
self.inputs.append(binding)
else:
self.outputs.append(binding)
assert self.batch_size > 0
assert len(self.inputs) > 0
assert len(self.outputs) > 0
assert len(self.allocations) > 0
def input_spec(self):
"""
Get the specs for the input tensor of the network. Useful to prepare memory allocations.
:return: Two items, the shape of the input tensor and its (numpy) datatype.
"""
return self.inputs[0]['shape'], self.inputs[0]['dtype']
def output_spec(self):
"""
Get the specs for the output tensors of the network. Useful to prepare memory allocations.
:return: A list with two items per element, the shape and (numpy) datatype of each output tensor.
"""
specs = []
for o in self.outputs:
specs.append((o['shape'], o['dtype']))
return specs
def infer(self, batch):
"""
Execute inference on a batch of images. The images should already be batched and preprocessed, as prepared by
the ImageBatcher class. Memory copying to and from the GPU device will be performed here.
:param batch: A numpy array holding the image batch.
:param scales: The image resize scales for each image in this batch. Default: No scale postprocessing applied.
:return: A nested list for each image in the batch and each detection in the list.
"""
# Prepare the output data.
outputs = []
for shape, dtype in self.output_spec():
outputs.append(np.zeros(shape, dtype))
# Process I/O and execute the network.
common.memcpy_host_to_device(self.inputs[0]['allocation'], np.ascontiguousarray(batch))
self.context.execute_v2(self.allocations)
for o in range(len(outputs)):
common.memcpy_device_to_host(outputs[o], self.outputs[o]['allocation'])
print(o, outputs[o].shape)
# Process the results.
# nums = outputs[0]
# boxes = outputs[1]
# scores = outputs[2]
# pred_classes = outputs[3]
# masks = outputs[4]
# detections = []
# for i in range(self.batch_size):
# detections.append([])
# for n in range(int(nums[i])):
# # Select a mask.
# mask = masks[i][n]
# # Calculate scaling values for bboxes.
# scale = self.inputs[0]['shape'][2]
# scale /= scales[i]
# scale_y = scale
# scale_x = scale
# if nms_threshold and scores[i][n] < nms_threshold:
# continue
# # Append to detections
# detections[i].append({
# 'ymin': boxes[i][n][0] * scale_y,
# 'xmin': boxes[i][n][1] * scale_x,
# 'ymax': boxes[i][n][2] * scale_y,
# 'xmax': boxes[i][n][3] * scale_x,
# 'score': scores[i][n],
# 'class': int(pred_classes[i][n]),
# 'mask': mask,
# })
# return detections
def main():
trt_infer = TensorRTInfer("model.engine")
input_batch = np.random.randn(1, 1, 682, 1024).astype(np.float64)
trt_infer.infer(input_batch)
main()
Your new model pass with TRT 9.2
[I] Finished engine building in 7.928 seconds
[I] trt-runner-N0-01/19/24-09:42:26
---- Inference Input(s) ----
{input.1 [dtype=float32, shape=(1, 1, 682, 1024)],
conv1a.weight [dtype=float32, shape=(64, 1, 3, 3)],
conv1a.bias [dtype=float32, shape=(64,)],
conv1b.weight [dtype=float32, shape=(64, 64, 3, 3)],
conv1b.bias [dtype=float32, shape=(64,)],
conv2a.weight [dtype=float32, shape=(64, 64, 3, 3)],
conv2a.bias [dtype=float32, shape=(64,)],
conv2b.weight [dtype=float32, shape=(64, 64, 3, 3)],
conv2b.bias [dtype=float32, shape=(64,)],
conv3a.weight [dtype=float32, shape=(128, 64, 3, 3)],
conv3a.bias [dtype=float32, shape=(128,)],
conv3b.weight [dtype=float32, shape=(128, 128, 3, 3)],
conv3b.bias [dtype=float32, shape=(128,)],
conv4a.weight [dtype=float32, shape=(128, 128, 3, 3)],
conv4a.bias [dtype=float32, shape=(128,)],
conv4b.weight [dtype=float32, shape=(128, 128, 3, 3)],
conv4b.bias [dtype=float32, shape=(128,)],
convPa.weight [dtype=float32, shape=(256, 128, 3, 3)],
convPa.bias [dtype=float32, shape=(256,)],
convPb.weight [dtype=float32, shape=(65, 256, 1, 1)],
convPb.bias [dtype=float32, shape=(65,)],
convDa.weight [dtype=float32, shape=(256, 128, 3, 3)],
convDa.bias [dtype=float32, shape=(256,)],
convDb.weight [dtype=float32, shape=(256, 256, 1, 1)],
convDb.bias [dtype=float32, shape=(256,)]}
[I] trt-runner-N0-01/19/24-09:42:26
---- Inference Output(s) ----
{427 [dtype=float32, shape=(1, 9382, 2)],
430 [dtype=float32, shape=(1, 9382)],
433 [dtype=float32, shape=(1, 256, 9382)]}
[I] trt-runner-N0-01/19/24-09:42:26 | Completed 1 iteration(s) in 16.05 ms | Average inference time: 16.05 ms.
[I] PASSED | Runtime: 10.422s | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run model.onnx --trt
You can download it from https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-11.8.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.linux.x86_64-gnu.cuda-12.2.tar.gz https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.ubuntu-22.04.aarch64-gnu.cuda-12.2.tar.gz
I was able to output normally using tensorrt in 8.6, but an error occurred while calling the python api for inference. The details are in the third picture.
Hey, which Object Detection model are you using?
I was able to output normally using tensorrt in 8.6, but an error occurred while calling the python api for inference. The details are in the third picture.
You can use trt v10 to try it.
Environment
TensorRT Version:
NVIDIA GPU: RTX 4090
NVIDIA Driver Version: 535.129.03
CUDA Version: 11.8
CUDNN Version: 8.9.6.50
Operating System: ubuntu 22.04
Python Version (if applicable): 3.9
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.13
Baremetal or Container (if so, version):
Relevant Files
Model link: https://www.dropbox.com/scl/fi/vzgb4iew1lvj64h6adnjt/model.onnx?rlkey=vqq56hc2t91r7b1m078ks7ycl&dl=0
Steps To Reproduce
I want to transform onnx to engine by using code
trtexec --onnx=model.onnx --saveEngine=model.trt --verbose
. Then an error was reported./MaxPool: at least 5 dimensions are required for input.
But I don't know why! The maxpool that reported the error is used here.