Closed aininot260 closed 4 years ago
Added,the print(parser.parse(model.read()))
get false
How can I get the detailed information?
The onnx model is converted from Pytorch.The converter is:
self.model = create_model(opt.arch, opt.heads, opt.head_conv)
self.model = load_model(self.model, opt.load_model)
self.model = self.model.to(opt.device)
self.model.eval()
images=torch.load('x.pt')
print("onnx 1")
torch.onnx.export(self.model,images, "crossnet.onnx", verbose=True)
print("onnx 2")`
The model works well without tensorrt.
These are the torch.onnx logs:
/home/nvidia/anaconda3/envs/tensorrt/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py:198: UserWarning: You are trying to export the model with onnx:Upsample for ONNX opset version 9. This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator.
"" + str(_export_onnx_opset_version) + ". "
I added:
index=0 print(parser.get_error(index))
It says:
In node 0 (importModel): INVALID_GRAPH: Assertion failed: tensors.count(input_name)
Hi @aininot260,
I think downgrading to Pytorch 1.2 is the current workaround for this. See related issues:
After I downgraded the pytorch version to 1.2.Another problem happened:
` (tensorrt) nvidia@Dell:~/Desktop/onnx_trt$ python main.py
heads {'hm': 1, 'wh': 2, 'reg': 2}
debug1
debug2
debug3
debug4
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 765624548
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
False
In node 208 (convert_axis): UNSUPPORTED_NODE: Assertion failed: axis >= 0 && axis < nbDims 0
debug5
debug6: 208
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File "main.py", line 29, in
ret = detector.run(img)['results']
File "./src/lib/detectors/base_detector.py", line 54, in run
output, dets, forward_time = self.process(images, return_time=True)
File "./src/lib/detectors/ctdet.py", line 83, in process
output = work(images)
File "./src/lib/detectors/ctdet.py", line 71, in work
with build_engine_onnx('crossnet.onnx') as engine:
AttributeError: enter
(tensorrt) nvidia@Dell:~/Desktop/onnx_trt$ `
I added some debug info:
def build_engine_onnx(model_file): print('debug1') with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser: print('debug2') builder.max_workspace_size = common.GiB(1) print('debug3')
with open(model_file, 'rb') as model:
print('debug4')
print(parser.parse(model.read()))
index=0
#print(parser.get_error(index).code)
#print(parser.get_error(index).desc)
error = parser.get_error(index)
print(type(error))
print('code:',error.code())
print('desc',error.desc())
print('file',error.file())
print('func',error.func())
print('line',error.line())
print('node',error.node())
print('debug5')
print('debug6:',network.num_layers)
last_layer = network.get_layer(network.num_layers - 1)
# Check if last layer recognizes it's output
if not last_layer.get_output(0):
# If not, then mark the output using TensorRT API
network.mark_output(last_layer.get_output(0))
return builder.build_cuda_engine(network)
It says:
(tensorrt) nvidia@Dell:~/Desktop/onnx_trt$ python main.py
heads {'hm': 1, 'wh': 2, 'reg': 2}
debug1
debug2
debug3
debug4
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 765624548
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
False
<class 'tensorrt.tensorrt.ParserError'>
code: UNSUPPORTED_NODE
desc Assertion failed: axis >= 0 && axis < nbDims
file onnx2trt_utils.hpp
func convert_axis
line 347
node 208
debug5
debug6: 208
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File "main.py", line 29, in
%1191 : Long() = onnx::Constant[value={2}](), scope: HourglassNet/kp_module/kp_module[low2]/kp_module[low2]/kp_module[low2]/kp_module[low2]/Upsample[up2] %1192 : Tensor = onnx::Shape(%1190), scope: HourglassNet/kp_module/kp_module[low2]/kp_module[low2]/kp_module[low2]/kp_module[low2]/Upsample[up2] %1193 : Long() = onnx::Gather[axis=0](%1192, %1191), scope: HourglassNet/kp_module/kp_module[low2]/kp_module[low2]/kp_module[low2]/kp_module[low2]/Upsample[up2] # /home/nvidia/anaconda3/envs/tensorrt/lib/python3.6/site-packages/torch/nn/functional.py:2466:0
The error happened in %1193(%985+208)
code: UNSUPPORTED_NODE desc Assertion failed: axis >= 0 && axis < nbDims
How to fix?This part of source code is: https://paste.ubuntu.com/p/B6fgnB3s6B/
Hi @aininot260,
Seems like you're getting an error on Shape or Gather ops. However, according to this list, those ops are supported with the current OSS parser: https://github.com/onnx/onnx-tensorrt/blob/master/operators.md. Did you build the OSS components to update the ONNX parser? Or are you just using the one that comes with the TensorRT release?
Also, this issue seems related: https://github.com/onnx/onnx-tensorrt/issues/283. Maybe this solution could help you: https://github.com/onnx/onnx-tensorrt/issues/283#issuecomment-545703178
The onnx-simplifier is helpful! Thank you!
Glad it helped!
getting similar errors what @aininot260 got. I'm loading the onnx-model after running it through 'onnx-simplifier'
python: ../builder/Network.cpp:1152: virtual nvinfer1::ILayer* nvinfer1::Network::getLayer(int) const: Assertion `layerIndex >= 0' failed.
Any suggestion on how to debug it?
Regards, MJay
Hi @mrutyu1987,
Sounds like ONNX parser failed, so some reference to a network layer is returning -1, and therefore getLayer(-1)
is raising that error.
I would check the errors thrown by the ONNX parser if any.
You can check these with the Python and C++ APIs like here: https://github.com/rmccorm4/tensorrt-utils/blob/3267d196bd3dc0ddd1f1b9c2364560627f018d43/classification/imagenet/onnx_to_tensorrt.py#L187-L191
I believe trtexec
will also output these errors.
Can you share the following outputs?
trtexec --explicitBatch --onnx=<model.onnx>
trtexec --explicitBatch --onnx=<simplified_model.onnx>
If you're using TRT 7, run those as is. If you're running an earlier version, remove the --explicitBatch
flag
thanks @rmccorm4
the code snippet really helped to debug the issue. following was the error while parsing the onnx model:
In node -1 (importModel): INVALID_VALUE: Assertion failed: !_importer_ctx.network()->hasImplicitBatchDimension() && "This version of the ONNX parser only supports TensorRT INetworkDefinitions with an explicit batch dimension. Please ensure the network was created using the EXPLICIT_BATCH NetworkDefinitionCreationFlag."
i was able resolve this by doing
explicit_batch = 1 << (int)(tensorrt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
builder.create_network(explicit_batch)
thanks
Happy to help @mrutyu1987
Hi @rmccorm4
I got a similar error as well when running a test example with a single interpolation error. From the above advice I already tried passing in the simplified onnx model and using the explicit batch as suggested above. I am using tensor rt 7 with torch 1.2.0, and running everything in the nvidia pytorch docker https://ngc.nvidia.com/catalog/containers/nvidia:pytorch
Currently my function looks something like the following:
def build_engine(model_path):
explicit_batch = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network(explicit_batch) as network, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = 1<<20 # orig 28
builder.max_batch_size = 16 # orig 1
builder.fp16_mode = True
print('Loading ONNX file from path {}...'.format(model_path))
with open(model_path, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
print('Beginning ONNX file parsing')
parser.parse(model.read())
last_layer = network.get_layer(network.num_layers - 1)
network.mark_output(last_layer.get_output(0))
print('Completed parsing of ONNX file')
engine = builder.build_cuda_engine(network)
return engine
The errors look like:
Loading ONNX file from path interp_simp.onnx...
In node 0 (importUpsample): UNSUPPORTED_NODE: Assertion failed: (nbDims >= 1) && (nbDims <= 3)
Beginning ONNX file parsing
python: ../builder/Network.cpp:1152: virtual nvinfer1::ILayer* nvinfer1::Network::getLayer(int) const: Assertion `layerIndex >= 0' failed.
Aborted (core dumped)
Is this just due to the interpolation error not being supported, or are there some workarounds for converting models with such layers? I have already tried using the torch2trt library but ran into issues with installing the inference mode https://github.com/NVIDIA-AI-IOT/torch2trt/issues/306.
Thanks in advance!
After solving the
[TensorRT] ERROR: Network must have at least one output
Another error happened
The code is:
The output is:
My pytorch version is 1.3.0. My tensorrt version is 6.0.1.5.
Which situation lead to this problem? Thanks.