inference result is error of TensorRT 8.4 when compare with onnxruntime

flyme2023 commented 1 year ago

Description

but, the output is different, i need to compare the layer result, i get a error when to compare layer result,

polygraphy run ./model.onnx --onnxrt --trt --workspace 12G --save-results=rtr_out.pkl --val-range input_ids:[0,21127] attention_mask:[1,1] token_type_ids:[0,0] --input-shapes input_ids:[1,8] attention_mask:[1,8] token_type_ids:[1,8] --trt-min-shapes input_ids:[1,8] token_type_ids:[1,8] attention_mask:[1,8] --trt-max-shapes input_ids:[3,8] token_type_ids:[3,8] attention_mask:[3,8] --trt-opt-shapes input_ids:[1,8] token_type_ids:[1,8] attention_mask:[1,8] --trt-outputs mark all --onnx-outputs mark all --tactic-sources CUBLAS

the error message:

[I] onnxrt-runner-N0-09/15/23-09:58:21 | Completed 1 iteration(s) in 107.8 ms | Average inference time: 107.8 ms. [I] trt-runner-N0-09/15/23-09:58:21 | Activating and starting inference [libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1298376457 [libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1298376457 [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [I] Configuring with profiles: [Profile().add('input_ids', min=[1, 8], opt=[1, 8], max=[3, 8]).add('attention_mask', min=[1, 8], opt=[1, 8], max=[3, 8]).add('token_type_ids', min=[1, 8], opt=[1, 8], max=[3, 8])] [I] Building engine with configuration: Flags | [] Engine Capability | EngineCapability.DEFAULT Tactic Sources | [CUBLAS] DLA | Default Device Type: DeviceType.GPU, Core: 0 Profiling Verbosity | ProfilingVerbosity.VERBOSE [E] 2: [myelinBuilderUtils.cpp::operator()::371] Error Code 2: Internal Error ([HostToDeviceCopy] requires bool I/O but node can not be handled by Myelin. Operation is not supported.) [E] 2: [builder.cpp::buildSerializedNetwork::619] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) [!] Invalid Engine. Please ensure the engine was built correctly [E] FAILED | Runtime: 32.800s | Command: /usr/local/bin/polygraphy run ./model.onnx --onnxrt --trt --workspace 12G --save-results=rtr_out.pkl --val-range input_ids:[0,21127] attention_mask:[1,1] token_type_ids:[0,0] --input-shapes input_ids:[1,8] attention_mask:[1,8] token_type_ids:[1,8] --trt-min-shapes input_ids:[1,8] token_type_ids:[1,8] attention_mask:[1,8] --trt-max-shapes input_ids:[3,8] token_type_ids:[3,8] attention_mask:[3,8] --trt-opt-shapes input_ids:[1,8] token_type_ids:[1,8] attention_mask:[1,8] --trt-outputs mark all --onnx-outputs mark all --tactic-sources CUBLAS

zerollzeng commented 1 year ago

@pranavm-nvidia ^ ^

zerollzeng commented 1 year ago

Maybe dup of https://github.com/NVIDIA/TensorRT/issues/2346, could you please try remove the mark all?

ttyio commented 1 year ago

closing since no activity for more than 3 weeks, thanks all!

rememberBr commented 2 months ago

closing since no activity for more than 3 weeks, thanks all!因超过三周没有活动而关闭，谢谢大家！

Hello, I am facing the same issue when using polygraphy run model.onnx --onnxrt --trt --atol 1e-5 --rtol 1e-5. The test passes when I run it, but when I add "mark all" it produces a lot of FAILED results. However, this does not solve the underlying problem. In my experiment, I tried exporting the ONNX model to an engine file with precision set to FP32, and tested the same input (generated by np.zeros), and the results from ONNXruntime were not consistent with the results from TRT, and there was a gap of more than 1e-5. What should I do?

NVIDIA / TensorRT

inference result is error of TensorRT 8.4 when compare with onnxruntime #3325

Description