Open geraldstanje opened 3 months ago
Hi, if you want to modify your onnx model, onnx-graphsurgeon is probably your best bet. Examples section shows how to use it.
Btw, the source for onnx-tensorrt is open here. Given that you're using a slightly older version of TRT, the line numbers (and even filenames) may not exactly match. But, you can see a couple of places where such errors are logged. :)
[E] [TRT] onnx2trt_utils.cpp:748: Found unsupported datatype (8) classes
Found unsupported datatype (8) classes
? see: https://github.com/huggingface/setfit/blob/main/src/setfit/exporters/onnx.py#L66C82-L66C97@lix19937 @brb-nv @pranavm-nvidia @sachanub
Hi, if you're open to sharing the onnx file, please consider doing so. Sorry, I'm not familiar with the term 'opcode'. You could point me to something that'll help me understand.
In the meantime, you can also dig deeper using these pointers:
From what I can tell looking at the verbose log, TRT encountered a model weight (initializer) named classes
of unsupported datatype (possibly a string but I could be wrong). To start with, you can:
1) Open the onnx model in netron
2) Locate the offending tensor and its type (string or something else?)
3) Also, which op this tensor is a part of [reference]
onnx-graphsurgeon
is something you could open your onnx model with (just like netron but in your command shell), print your model and make tweaks to it [example]. You could possibly narrow down to the classes
tensor from model print-out. So, consider giving it a shot.
Will await you sharing your observations.
here the requested infos:
as far i understand i need to convert a pytorch model (im using hugginface Sentence Transformers: https://github.com/huggingface/setfit) to onnx before using tensorrt. is that correct that i need to do: pytorch -> onnx -> tensorrt?
"Locate the offending tensor and its type (string or something else?)"
i visualized the model using netron - here is what classes looks like:
here we see that label is a string:
if strings are not supported? what to do if classes is a string now? label is a string given by this lib and also output to onnx: https://github.com/huggingface/setfit/blob/main/src/setfit/exporters/onnx.py#L183C5-L183C16
model.onnx is here (added via git LFS: https://github.com/geraldstanje/onnx_model/
what docker image to use for polygraphy? i have cuda 12 installed but polygraphy looks for a cuda 11.x lib: libcublasLt.so.11
? i currently use: nvcr.io/nvidia/pytorch:24.03-py3
[I] RUNNING | Command: /usr/local/bin/polygraphy run model.onnx --onnxrt --providers CUDAExecutionProvider
[I] onnxrt-runner-N0-05/24/24-00:42:23 | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CUDAExecutionProvider']
2024-05-24 00:42:23.653071659 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
can i use https://github.com/NVIDIA/TensorRT/blob/master/tools/onnx-graphsurgeon/examples/06_removing_nodes/remove.py to remove the string node?
@lix19937 @brb-nv @pranavm-nvidia @sachanub
@brb-nv Hi, I meet the similar problem, could you please help me? Thanks a lot! I use trtexec to conduct int8 calibration and quantilize like this:
trtexec \
--onnx=onnx_model/model.onnx \
--minShapes=xs:1x1120,xlen:1 \
--optShapes=xs:1x160000,xlen:1 \
--maxShapes=xs:1x480000,xlen:1 \
--minShapesCalib=xs:1x1120,xlen:1 \
--optShapesCalib=xs:1x160000,xlen:1 \
--maxShapesCalib=xs:1x480000,xlen:1 \
--workspace=20480 \
--int8 \
--calib=model_calibration.cache \
--saveEngine=trt_model/model-INT8.plan \
--verbose \
--buildOnly
The calibration cache file was generated with polygraphy API. But when I run the code above, it give the following error:
[05/24/2024-13:58:26] [I] [TRT] CPLX_M_rfftrfft__333: broadcasting input1 to make tensors conform, dims(input0)=[2,257,512][NONE] dims(input1)=[1,512,-1][NONE].
[05/24/2024-13:58:26] [I] [TRT] CPLX_M_rfftrfft__333: broadcasting input1 to make tensors conform, dims(input0)=[2,257,512][NONE] dims(input1)=[1,512,-1][NONE].
[05/24/2024-13:58:26] [I] Finish parsing network model
[05/24/2024-13:58:26] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[05/24/2024-13:58:26] [I] [TRT] CPLX_M_rfftrfft__333: broadcasting input1 to make tensors conform, dims(input0)=[2,257,512][NONE] dims(input1)=[1,512,-1][NONE].
[05/24/2024-13:58:27] [I] [TRT] Calibration table does not match calibrator algorithm type.
[05/24/2024-13:58:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +840, GPU +362, now: CPU 1921, GPU 7055 (MiB)
[05/24/2024-13:58:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +128, GPU +60, now: CPU 2049, GPU 7115 (MiB)
[05/24/2024-13:58:28] [I] [TRT] Timing cache disabled. Turning it on will improve builder speed.
[05/24/2024-13:58:31] [I] [TRT] Detected 2 inputs and 1 output network tensors.
[05/24/2024-13:58:31] [I] [TRT] Total Host Persistent Memory: 58640
[05/24/2024-13:58:31] [I] [TRT] Total Device Persistent Memory: 0
[05/24/2024-13:58:31] [I] [TRT] Total Scratch Memory: 4194304
[05/24/2024-13:58:31] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 132 MiB, GPU 384 MiB
[05/24/2024-13:58:41] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 10032.1ms to assign 80 blocks to 1209 nodes requiring 17717760 bytes.
[05/24/2024-13:58:41] [I] [TRT] Total Activation Memory: 17717760
[05/24/2024-13:58:41] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2913, GPU 7775 (MiB)
[05/24/2024-13:58:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 2914, GPU 7785 (MiB)
[05/24/2024-13:58:41] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2913, GPU 7761 (MiB)
[05/24/2024-13:58:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2913, GPU 7769 (MiB)
[05/24/2024-13:58:41] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +16, now: CPU 130, GPU 272 (MiB)
[05/24/2024-13:58:41] [I] [TRT] Starting Calibration.
[05/24/2024-13:58:41] [E] Error[1]: [calibrator.cpp::add::758] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [executionContext.cpp::commonEmitDebugTensor::1264] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [executionContext.cpp::commonEmitDebugTensor::1297] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [executionContext.cpp::executeInternal::626] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[05/24/2024-13:58:41] [E] Error[1]: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[05/24/2024-13:58:42] [E] Error[2]: [calibrator.cpp::calibrateEngine::1160] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
[05/24/2024-13:58:42] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[05/24/2024-13:58:42] [E] Engine could not be created from network
[05/24/2024-13:58:42] [E] Cuda failure: an illegal memory access was encountered
Aborted (core dumped)
What's wrong with it? In fact, I have tried to conduct quantilize directly with polygraphy API, but the engine size is not as small as we expected, only from 156M (FP32 engine) to 95M, what's more, the inference speed have no improvement compared with FP32 engine. But when I tried to generate INT8 engine with trtexec tool (with no calibration), the engine size become 51M, so I want to conduct int8 calibration and quantilize with trtexec tools to check again. So, can I use trtexec tool and the calibration cache file to achieve this goal? If it works, what's wrong with my code? I'm looking froward to your help, many thanks.
Hi @geraldstanje
as far i understand i need to convert a pytorch model (im using hugginface Sentence Transformers: https://github.com/huggingface/setfit) to onnx before using tensorrt. is that correct that i need to do: pytorch -> onnx -> tensorrt?
Yes, TRT’s primary means of importing a trained model from a framework is through the ONNX interchange format.
Yes, you'll need to remove the unsupported part of the network for the engine to be built. Also, I don't think ArrayFeatureExtractor
is supported by TRT right now. So, you're probably better off removing everything after the Argmax
op and implement it yourself outside the model definition. The graphsurgeon example you pointed to is the most relevant one. :)
Unsure about the issue with polygraphy. Please try out our latest release.
@yjiangling kindly open a separate issue. It's quite different from OP's issue.
@geraldstanje What is your torch.onnx._export
or torch.onnx.export
cmd ? Can show here ?
@lix19937 im using the build in function of the huggingface lib which calls torch.onnx.export - see: https://github.com/huggingface/setfit/blob/main/src/setfit/exporters/onnx.py#L183 here it calls the torch.onnx.export: https://github.com/huggingface/setfit/blob/main/src/setfit/exporters/onnx.py#L96-L103
Importing initializer: classes
The root cause is in ArrayFeatureExtractor
from your onnx, at the same time, ArrayFeatureExtractor
op not support by trt, you can clip the onnx, the out of argmax is one output, another output(out of softmax) not change.
Or you modify the forward code(exclude ArrayFeatureExtractor), then re-export onnx.
i removed the ArrayFeatureExtractor from the onnx model - here it looks like this now:
took nvcr.io/nvidia/tritonserver:24.04-py3 image - i installed polygraphy via pip:
installed packages:
pip list
Package Version
------------------------ -------------
blinker 1.4
colored 2.2.4
cryptography 3.4.8
dbus-python 1.2.18
distlib 0.3.8
distro 1.7.0
filelock 3.13.4
httplib2 0.20.2
importlib-metadata 4.6.4
jeepney 0.7.1
keyring 23.5.0
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
more-itertools 8.10.0
numpy 1.26.4
nvidia-cuda-runtime-cu12 12.5.39
oauthlib 3.2.0
pip 24.0
platformdirs 4.2.0
polygraphy 0.49.9
PyGObject 3.42.1
PyJWT 2.3.0
pyparsing 2.4.7
python-apt 2.4.0+ubuntu3
SecretStorage 3.3.1
setuptools 69.5.1
six 1.16.0
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
virtualenv 20.25.3
wadllib 1.3.6
wheel 0.43.0
zipp 1.0.0
run trtexec - get the following trtexec output - does that look good? trtexec.txt
[05/25/2024-23:05:17] [I] Multithreading: Disabled [05/25/2024-23:05:17] [I] CUDA Graph: Disabled
- why is that?is still have a problem here - how to fix that?
polygraphy run model.plan --trt
[I] RUNNING | Command: /usr/local/bin/polygraphy run model.plan --trt
[I] trt-runner-N0-05/25/24-23:05:50 | Activating and starting inference
[I] Loading bytes from /models/model.plan
[E] 1: [stdArchiveReader.cpp::stdArchiveReaderInitCommon::47] Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 237, Serialized Engine Version: 236)
[!] Could not deserialize engine. See log for details.
[E] FAILED | Runtime: 0.470s | Command: /usr/local/bin/polygraphy run model.plan --trt
how can i see the expected input and output shape using polygraphy - for defining the config.pbtxt files for tritonserver?
is there a way to visualize the model.plan somehow similar to model.onnx plot?
how can i use the generated model.plan and test it for accuracy - can i do that with polygraphy or i need to deploy it with triton server?
cc @lix19937 @brb-nv
run trtexec - get the following trtexec output - does that look good? trtexec.txt
i see: [05/25/2024-23:05:17] [I] Multithreading: Disabled [05/25/2024-23:05:17] [I] CUDA Graph: Disabled - why is that?
It's normal, because you do not set those features.
is still have a problem here - how to fix that? [E] 1: [stdArchiveReader.cpp::stdArchiveReaderInitCommon::47] Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 237, Serialized Engine Version: 236)
Not match env, use follow cmd
polygraphy convert removed.onnx -o model_poly.plan
polygraphy run model_poly.plan --trt
how can i see the expected input and output shape using polygraphy - for defining the config.pbtxt files for tritonserver?
polygraphy run removed.onnx --trt \
--data-loader-script data_loader.py
is there a way to visualize the model.plan somehow similar to model.onnx plot?
trex
see more https://github.com/NVIDIA/TensorRT/tree/release/10.0/tools/experimental/trt-engine-explorer/trex
how can i use the generated model.plan and test it for accuracy - can i do that with polygraphy or i need to deploy it with triton server?
polygraphy run removed.onnx --trt --onnxrt --fp16
trtexec:
trtexec --onnx=removed.onnx --saveEngine=model.plan
?
polygraphy convert:
polygraphy convert removed.onnx -o model_poly.plan
works - but why does trtexec or polygraphy convert allow int64 when triton inference server cannot run it?polygraphy convert removed.onnx -o model_poly.plan
generate the same output as trtexec --onnx=removed.onnx --saveEngine=model.plan
?[E] 1: [stdArchiveReader.cpp::stdArchiveReaderInitCommon::47] Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 237, Serialized Engine Version: 236)
$ polygraphy convert removed.onnx -o model_poly.plan
[W] ModelImporter.cpp:420: Make sure input input_ids has Int64 binding.
[W] ModelImporter.cpp:420: Make sure input attention_mask has Int64 binding.
[W] ModelImporter.cpp:420: Make sure input token_type_ids has Int64 binding.
[W] ModelImporter.cpp:680: Make sure output label has Int64 binding.
[W] Input tensor: input_ids (dtype=DataType.INT64, shape=(-1, -1)) | No shapes provided; Will use shape: [1, 1] for min/opt/max in profile.
[W] This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[W] Input tensor: attention_mask (dtype=DataType.INT64, shape=(-1, -1)) | No shapes provided; Will use shape: [1, 1] for min/opt/max in profile.
[W] Input tensor: token_type_ids (dtype=DataType.INT64, shape=(-1, -1)) | No shapes provided; Will use shape: [1, 1] for min/opt/max in profile.
[I] Configuring with profiles:[
Profile 0:
{input_ids [min=[1, 1], opt=[1, 1], max=[1, 1]],
attention_mask [min=[1, 1], opt=[1, 1], max=[1, 1]],
token_type_ids [min=[1, 1], opt=[1, 1], max=[1, 1]]}
]
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Building engine with configuration:
Flags | []
Engine Capability | EngineCapability.STANDARD
Memory Pools | [WORKSPACE: 14930.56 MiB, TACTIC_DRAM: 14930.56 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
Tactic Sources | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [PROFILE_SHARING_0806]
[I] Finished engine building in 2.010 seconds
polygraphy run with tensorrt:
$ polygraphy run removed.onnx --trt --onnxrt --tf32 --execution-providers=cuda
[I] RUNNING | Command: /home/ubuntu/triton_inference_server/create_trt_model/venv/bin/polygraphy run removed.onnx --trt --onnxrt --tf32 --execution-providers=cuda
[I] trt-runner-N0-05/27/24-04:42:14 | Activating and starting inference
[W] ModelImporter.cpp:420: Make sure input input_ids has Int64 binding.
[W] ModelImporter.cpp:420: Make sure input attention_mask has Int64 binding.
[W] ModelImporter.cpp:420: Make sure input token_type_ids has Int64 binding.
[W] ModelImporter.cpp:680: Make sure output label has Int64 binding.
[W] Input tensor: input_ids (dtype=DataType.INT64, shape=(-1, -1)) | No shapes provided; Will use shape: [1, 1] for min/opt/max in profile.
[W] This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[W] Input tensor: attention_mask (dtype=DataType.INT64, shape=(-1, -1)) | No shapes provided; Will use shape: [1, 1] for min/opt/max in profile.
[W] Input tensor: token_type_ids (dtype=DataType.INT64, shape=(-1, -1)) | No shapes provided; Will use shape: [1, 1] for min/opt/max in profile.
[I] Configuring with profiles:[
Profile 0:
{input_ids [min=[1, 1], opt=[1, 1], max=[1, 1]],
attention_mask [min=[1, 1], opt=[1, 1], max=[1, 1]],
token_type_ids [min=[1, 1], opt=[1, 1], max=[1, 1]]}
]
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Building engine with configuration:
Flags | [TF32]
Engine Capability | EngineCapability.STANDARD
Memory Pools | [WORKSPACE: 14930.56 MiB, TACTIC_DRAM: 14930.56 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
Tactic Sources | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [PROFILE_SHARING_0806]
[I] Finished engine building in 1.991 seconds
[I] trt-runner-N0-05/27/24-04:42:14
---- Inference Input(s) ----
{input_ids [dtype=int64, shape=(1, 1)],
attention_mask [dtype=int64, shape=(1, 1)],
token_type_ids [dtype=int64, shape=(1, 1)]}
[I] trt-runner-N0-05/27/24-04:42:14
---- Inference Output(s) ----
{label [dtype=int64, shape=(1,)],
probabilities [dtype=float32, shape=(1, 2)]}
[I] trt-runner-N0-05/27/24-04:42:14 | Completed 1 iteration(s) in 4.848 ms | Average inference time: 4.848 ms.
[I] onnxrt-runner-N0-05/27/24-04:42:14 | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CUDAExecutionProvider']
2024-05-27 04:42:18.834213613 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 2 Memcpy nodes are added to the graph main_graph_06a8d9446585464486ee2407d95613e9 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-27 04:42:18.835803344 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-27 04:42:18.835830375 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
[I] onnxrt-runner-N0-05/27/24-04:42:14
---- Inference Input(s) ----
{input_ids [dtype=int64, shape=(1, 1)],
attention_mask [dtype=int64, shape=(1, 1)],
token_type_ids [dtype=int64, shape=(1, 1)]}
[I] onnxrt-runner-N0-05/27/24-04:42:14
---- Inference Output(s) ----
{label [dtype=int64, shape=(1,)],
probabilities [dtype=float64, shape=(1, 2)]}
[I] onnxrt-runner-N0-05/27/24-04:42:14 | Completed 1 iteration(s) in 20.36 ms | Average inference time: 20.36 ms.
[I] Accuracy Comparison | trt-runner-N0-05/27/24-04:42:14 vs. onnxrt-runner-N0-05/27/24-04:42:14
[I] Comparing Output: 'label' (dtype=int64, shape=(1,)) with 'label' (dtype=int64, shape=(1,))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-05/27/24-04:42:14: label | Stats: mean=1, std-dev=0, var=0, median=1, min=1 at (0,), max=1 at (0,), avg-magnitude=1
[I] onnxrt-runner-N0-05/27/24-04:42:14: label | Stats: mean=1, std-dev=0, var=0, median=1, min=1 at (0,), max=1 at (0,), avg-magnitude=1
[I] Error Metrics: label
[I] Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0,), max=0 at (0,), avg-magnitude=0
[I] Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0,), max=0 at (0,), avg-magnitude=0
[I] PASSED | Output: 'label' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I] Comparing Output: 'probabilities' (dtype=float32, shape=(1, 2)) with 'probabilities' (dtype=float64, shape=(1, 2))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-05/27/24-04:42:14: probabilities | Stats: mean=0.5, std-dev=0.5, var=0.25, median=0.5, min=5.318e-13 at (0, 0), max=1 at (0, 1), avg-magnitude=0.5
[I] onnxrt-runner-N0-05/27/24-04:42:14: probabilities | Stats: mean=0.5, std-dev=0.5, var=0.25, median=0.5, min=5.318e-13 at (0, 0), max=1 at (0, 1), avg-magnitude=0.5
[I] Error Metrics: probabilities
[I] Minimum Required Tolerance: elemwise error | [abs=5.318e-13] OR [rel=3.0581e-06] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=2.659e-13, std-dev=2.659e-13, var=7.0702e-26, median=2.659e-13, min=1.6263e-18 at (0, 0), max=5.318e-13 at (0, 1), avg-magnitude=2.659e-13
[I] Relative Difference | Stats: mean=1.5291e-06, std-dev=1.5291e-06, var=2.338e-12, median=1.5291e-06, min=5.318e-13 at (0, 1), max=3.0581e-06 at (0, 0), avg-magnitude=1.5291e-06
[I] PASSED | Output: 'probabilities' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I] PASSED | All outputs matched | Outputs: ['label', 'probabilities']
[I] Accuracy Summary | trt-runner-N0-05/27/24-04:42:14 vs. onnxrt-runner-N0-05/27/24-04:42:14 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 7.003s | Command: /home/ubuntu/triton_inference_server/create_trt_model/venv/bin/polygraphy run removed.onnx --trt --onnxrt --tf32 --execution-providers=cuda
polygraphy run for onnx:
polygraphy run removed.onnx --onnxrt --execution-providers=cuda
[I] RUNNING | Command: /home/ubuntu/triton_inference_server/create_trt_model/venv/bin/polygraphy run removed.onnx --onnxrt --execution-providers=cuda
[I] onnxrt-runner-N0-05/27/24-04:44:20 | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CUDAExecutionProvider']
2024-05-27 04:44:21.106331814 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 2 Memcpy nodes are added to the graph main_graph_06a8d9446585464486ee2407d95613e9 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-27 04:44:21.107955920 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-27 04:44:21.107980378 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
[W] Input tensor: input_ids [shape=BoundedShape(['batch_size', 'sequence'], min=None, max=None)] | Will generate data of shape: [1, 1].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: attention_mask [shape=BoundedShape(['batch_size', 'sequence'], min=None, max=None)] | Will generate data of shape: [1, 1].
If this is incorrect, please provide a custom data loader.
[W] Input tensor: token_type_ids [shape=BoundedShape(['batch_size', 'sequence'], min=None, max=None)] | Will generate data of shape: [1, 1].
If this is incorrect, please provide a custom data loader.
[I] onnxrt-runner-N0-05/27/24-04:44:20
---- Inference Input(s) ----
{input_ids [dtype=int64, shape=(1, 1)],
attention_mask [dtype=int64, shape=(1, 1)],
token_type_ids [dtype=int64, shape=(1, 1)]}
[I] onnxrt-runner-N0-05/27/24-04:44:20
---- Inference Output(s) ----
{label [dtype=int64, shape=(1,)],
probabilities [dtype=float64, shape=(1, 2)]}
[I] onnxrt-runner-N0-05/27/24-04:44:20 | Completed 1 iteration(s) in 5.083 ms | Average inference time: 5.083 ms.
[I] PASSED | Runtime: 1.594s | Command: /home/ubuntu/triton_inference_server/create_trt_model/venv/bin/polygraphy run removed.onnx --onnxrt --execution-providers=cuda
cc @lix19937 @brb-nv
You should ensure that build and run the engine in the same enviroment. This kind of error is usually due to different tensorrt versions.
The tensorrt_version of your polygraphy not match your tensorrt_version of trtexec. If they are the same tensorrt version, the plans are the same.
i run trtexec and polygraphy (installed via pip install polygraphy) in the same docker container - using: nvcr.io/nvidia/tritonserver:24.04-py3
image - how to get trtexec and polygraphy to match the same engine there?
@lix19937
@yjiangling kindly open a separate issue. It's quite different from OP's issue.
Ok, thank you. I have open an new issue in here: https://github.com/NVIDIA/TensorRT/issues/3902
i run trtexec and polygraphy (installed via pip install polygraphy) in the same docker container - using:
nvcr.io/nvidia/tritonserver:24.04-py3
image - how to get trtexec and polygraphy to match the same engine there?@lix19937
You can
pip list | grep tensorrt
and
trtexec --help |grep 'TensorRT.trtexec'
ldd -r trtexec
$ /opt/tritonserver/TensorRT/build/trtexec --onnx=removed.onnx --saveEngine=model2.plan --verbose
&&&& RUNNING TensorRT.trtexec [TensorRT v100001] # /opt/tritonserver/TensorRT/build/trtexec --onnx=removed.onnx --saveEngine=model2.plan --verbose
[05/27/2024-16:39:16] [I] === Model Options ===
[05/27/2024-16:39:16] [I] Format: ONNX
[05/27/2024-16:39:16] [I] Model: removed.onnx
[05/27/2024-16:39:16] [I] Output:
[05/27/2024-16:39:16] [I] === Build Options ===
[05/27/2024-16:39:16] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[05/27/2024-16:39:16] [I] avgTiming: 8
[05/27/2024-16:39:16] [I] Precision: FP32
[05/27/2024-16:39:16] [I] LayerPrecisions:
[05/27/2024-16:39:16] [I] Layer Device Types:
[05/27/2024-16:39:16] [I] Calibration:
[05/27/2024-16:39:16] [I] Refit: Disabled
[05/27/2024-16:39:16] [I] Strip weights: Disabled
[05/27/2024-16:39:16] [I] Version Compatible: Disabled
[05/27/2024-16:39:16] [I] ONNX Plugin InstanceNorm: Disabled
[05/27/2024-16:39:16] [I] TensorRT runtime: full
[05/27/2024-16:39:16] [I] Lean DLL Path:
[05/27/2024-16:39:16] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[05/27/2024-16:39:16] [I] Exclude Lean Runtime: Disabled
[05/27/2024-16:39:16] [I] Sparsity: Disabled
[05/27/2024-16:39:16] [I] Safe mode: Disabled
[05/27/2024-16:39:16] [I] Build DLA standalone loadable: Disabled
[05/27/2024-16:39:16] [I] Allow GPU fallback for DLA: Disabled
[05/27/2024-16:39:16] [I] DirectIO mode: Disabled
[05/27/2024-16:39:16] [I] Restricted mode: Disabled
[05/27/2024-16:39:16] [I] Skip inference: Disabled
[05/27/2024-16:39:16] [I] Save engine: model2.plan
[05/27/2024-16:39:16] [I] Load engine:
[05/27/2024-16:39:16] [I] Profiling verbosity: 0
[05/27/2024-16:39:16] [I] Tactic sources: Using default tactic sources
[05/27/2024-16:39:16] [I] timingCacheMode: local
[05/27/2024-16:39:16] [I] timingCacheFile:
[05/27/2024-16:39:16] [I] Enable Compilation Cache: Enabled
[05/27/2024-16:39:16] [I] errorOnTimingCacheMiss: Disabled
[05/27/2024-16:39:16] [I] Preview Features: Use default preview flags.
[05/27/2024-16:39:16] [I] MaxAuxStreams: -1
[05/27/2024-16:39:16] [I] BuilderOptimizationLevel: -1
[05/27/2024-16:39:16] [I] Calibration Profile Index: 0
[05/27/2024-16:39:16] [I] Weight Streaming: Disabled
[05/27/2024-16:39:16] [I] Debug Tensors:
[05/27/2024-16:39:16] [I] Input(s)s format: fp32:CHW
[05/27/2024-16:39:16] [I] Output(s)s format: fp32:CHW
[05/27/2024-16:39:16] [I] Input build shapes: model
[05/27/2024-16:39:16] [I] Input calibration shapes: model
[05/27/2024-16:39:16] [I] === System Options ===
[05/27/2024-16:39:16] [I] Device: 0
[05/27/2024-16:39:16] [I] DLACore:
[05/27/2024-16:39:16] [I] Plugins:
[05/27/2024-16:39:16] [I] setPluginsToSerialize:
[05/27/2024-16:39:16] [I] dynamicPlugins:
[05/27/2024-16:39:16] [I] ignoreParsedPluginLibs: 0
[05/27/2024-16:39:16] [I]
[05/27/2024-16:39:16] [I] === Inference Options ===
[05/27/2024-16:39:16] [I] Batch: Explicit
[05/27/2024-16:39:16] [I] Input inference shapes: model
[05/27/2024-16:39:16] [I] Iterations: 10
[05/27/2024-16:39:16] [I] Duration: 3s (+ 200ms warm up)
[05/27/2024-16:39:16] [I] Sleep time: 0ms
[05/27/2024-16:39:16] [I] Idle time: 0ms
[05/27/2024-16:39:16] [I] Inference Streams: 1
[05/27/2024-16:39:16] [I] ExposeDMA: Disabled
[05/27/2024-16:39:16] [I] Data transfers: Enabled
[05/27/2024-16:39:16] [I] Spin-wait: Disabled
[05/27/2024-16:39:16] [I] Multithreading: Disabled
[05/27/2024-16:39:16] [I] CUDA Graph: Disabled
[05/27/2024-16:39:16] [I] Separate profiling: Disabled
[05/27/2024-16:39:16] [I] Time Deserialize: Disabled
[05/27/2024-16:39:16] [I] Time Refit: Disabled
[05/27/2024-16:39:16] [I] NVTX verbosity: 0
[05/27/2024-16:39:16] [I] Persistent Cache Ratio: 0
[05/27/2024-16:39:16] [I] Optimization Profile Index: 0
[05/27/2024-16:39:16] [I] Weight Streaming Budget: Disabled
[05/27/2024-16:39:16] [I] Inputs:
[05/27/2024-16:39:16] [I] Debug Tensor Save Destinations:
[05/27/2024-16:39:16] [I] === Reporting Options ===
[05/27/2024-16:39:16] [I] Verbose: Enabled
[05/27/2024-16:39:16] [I] Averages: 10 inferences
[05/27/2024-16:39:16] [I] Percentiles: 90,95,99
[05/27/2024-16:39:16] [I] Dump refittable layers:Disabled
[05/27/2024-16:39:16] [I] Dump output: Disabled
[05/27/2024-16:39:16] [I] Profile: Disabled
[05/27/2024-16:39:16] [I] Export timing to JSON file:
[05/27/2024-16:39:16] [I] Export output to JSON file:
[05/27/2024-16:39:16] [I] Export profile to JSON file:
[05/27/2024-16:39:16] [I]
[05/27/2024-16:39:16] [I] === Device Information ===
[05/27/2024-16:39:16] [I] Available Devices:
[05/27/2024-16:39:16] [I] Device 0: "Tesla T4" UUID: GPU-2c78ffbb-6ac9-111e-43ed-0c697b4619d4
[05/27/2024-16:39:16] [I] Selected Device: Tesla T4
[05/27/2024-16:39:16] [I] Selected Device ID: 0
[05/27/2024-16:39:16] [I] Selected Device UUID: GPU-2c78ffbb-6ac9-111e-43ed-0c697b4619d4
[05/27/2024-16:39:16] [I] Compute Capability: 7.5
[05/27/2024-16:39:16] [I] SMs: 40
[05/27/2024-16:39:16] [I] Device Global Memory: 14930 MiB
[05/27/2024-16:39:16] [I] Shared Memory per SM: 64 KiB
[05/27/2024-16:39:16] [I] Memory Bus Width: 256 bits (ECC enabled)
[05/27/2024-16:39:16] [I] Application Compute Clock Rate: 1.59 GHz
[05/27/2024-16:39:16] [I] Application Memory Clock Rate: 5.001 GHz
[05/27/2024-16:39:16] [I]
[05/27/2024-16:39:16] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/27/2024-16:39:16] [I]
[05/27/2024-16:39:16] [I] TensorRT version: 10.0.1
[05/27/2024-16:39:16] [I] Loading standard plugins
Segmentation fault (core dumped)
$ ldd -r /opt/tritonserver/TensorRT/build/trtexec linux-vdso.so.1 (0x00007ffe0b1e3000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff1c73d7000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff1c72f0000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff1c72d0000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff1c70a7000) /lib64/ld-linux-x86-64.so.2 (0x00007ff1c774b000)
$ /opt/tritonserver/TensorRT/build/trtexec --help |grep 'TensorRT.trtexec'
&&&& RUNNING TensorRT.trtexec [TensorRT v100001] # /opt/tritonserver/TensorRT/build/trtexec --help
$ pip list | grep "tensorrt" tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1
- i build trtexec for v100001 myself:
[build_trt.txt](https://github.com/NVIDIA/TensorRT/files/15458361/build_trt.txt)
- does it matter if the onnx model has opset=0, opset=13 or opset=90 when you run trtexec --onnx=removed.onnx --saveEngine=model.plan?
- why does ```polygraphy convert removed.onnx -o model_poly.plan``` or ```trtexec --onnx=removed.onnx --saveEngine=model2.plan``` use int64 and not convert it already to e.g. int32 when triton inference server cannot run it?
cc @lix19937 @brb-nv
i get Segmentation fault with trtexec using engine v100001 - the same worked with trtexec using engine v8603 - any idea?
Your cuda runtime/drive env not match v1001 requirement, see https://github.com/NVIDIA/TensorRT?tab=readme-ov-file#prerequisites.
does it matter if the onnx model has opset=0, opset=13 or opset=90 when you run trtexec --onnx=removed.onnx --saveEngine=model.plan?
TensorRT’s primary means of importing a trained model from a framework is through the ONNX interchange format. TensorRT ships with an ONNX parser library to assist in importing models. Where possible, the parser is backward compatible up to opset 9; the ONNX Model Opset Version Converter can assist in resolving incompatibilities. The GitHub version may support later opsets than the version shipped with TensorRT. Refer to the ONNX-TensorRT operator support matrix for the latest information on the supported opset and operators. For TensorRT deployment, we recommend exporting to the latest available ONNX opset.
why does polygraphy convert removed.onnx -o model_poly.plan or trtexec --onnx=removed.onnx --saveEngine=model2.plan use int64 and not convert it already to e.g. int32 when triton inference server cannot run it?
Your need know one thing, three tool or process all need libnvinfer.so or libnvinfer.a, if you make sure use the same version so, then pass.
Your cuda runtime/drive env not match v1001 requirement, see https://github.com/NVIDIA/TensorRT?tab=readme-ov-file#prerequisites.
ok that worked (used docker image: nvcr.io/nvidia/pytorch:24.05-py3)
TensorRT’s primary means of importing a trained model from a framework is through the ONNX interchange format. TensorRT ships with an ONNX parser library to assist in importing models. Where possible, the parser is backward compatible up to opset 9; the ONNX Model Opset Version Converter can assist in resolving incompatibilities.
supported onnx opset for tensorrt 8.6.3 (https://github.com/onnx/onnx-tensorrt/blob/6872a9473391a73b96741711d52b98c2c3e25146/docs/operators.md )
TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)
TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL
polygraphy:
polygraphy run model.plan
?cc @lix19937 @brb-nv
TensorRT 8.6 supports operators up to Opset 17.
This generally means all opset versions [0, 17] of a certain op is supported. For example, if you look at operators.md, BatchNorm has been updated in opsets 15, 14, 9, 7, 6, 1. Given the statement above, ideally, BatchNorm of all those opsets must be supported by TRT.
does that mean opset 0 to opset 17? it it better for tensorrt to export the onnx model with opset 0 or opset 17?
I'd try to export with opset 17 and keep in mind any gaps in TRT support by looking at the onnx2trt support matrix.
also how can i display some more information like latency, used memory, input shape, output shape for polygraphy run model.plan?
I think trtexec (with --verbose option) shows everything you're looking for. Polygraphy is more suitable for accuracy debugging and not as much for measuring performance.
also how can i check that tensorrt uses the gpu to run vs cpu? can i check that with polygraphy run too?
I can see this with trtexec with --verbose option.
i have a model.plan - i dont know which settings (input shape, output shape, batch_size) i defined for trtexec - how an i figure it out? can i load the model.plan with polygraphy or another tool to get infos/
als is it possible to write the tokenizer in c++ (for huggingface sentence transformer model) for triton inference server? current code:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
# Tokenize sentences
tokens = tokenizer(sentences, return_token_type_ids=True, return_tensors=TensorType.NUMPY, max_length=128, truncation=True)
cc @lix19937 @brb-nv
Use trtexec --loadEngine=model.plan --verbose
can show some msg.
@geraldstanje ref https://github.com/lix19937/trt-samples-for-hackathon-cn/blob/master/cookbook/01-SimpleDemo/TensorRT8.5/main.cpp use cpp load plan then infer.
@lix19937 whats the diff between the main.cpp and the trtexec --loadEngine=model.plan --verbose
?
The effect is basically the same,
trtexec
is a command line wrapper tool, to quickly utilize TensorRT without having to develop your own application. The trtexec tool has two main purposes:
It’s useful for benchmarking networks on random or user-provided input data. It’s useful for generating serialized engines from models.
main.cpp
is a demo to show how to use cpp to infer and integrate into your own project.
@lix19937 what is the purpose of workspace? can workspace still be used?
i did compare the tensorrt python lib with trtexec:
-rw-r--r-- 1 root root 91807988 Jun 21 03:26 model.plan <--- generated with tensorrt python code
-rw-r--r-- 1 root root 91815788 Jun 21 03:24 model2.plan <--- generated with trtexec
python version:
import tensorrt as trt
def convert_onnx_to_trt(onnx_model_path, trt_model_path, workspace=140000):
# Create a TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
# Create a builder and a network
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(
1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
)
# Create a parser to read the onnx file
parser = trt.OnnxParser(network, TRT_LOGGER)
# Parse the ONNX model
with open(onnx_model_path, 'rb') as model_file:
if not parser.parse(model_file.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
# Set the builder configuration
config = builder.create_builder_config()
config.max_workspace_size = workspace * (1024 * 1024) # Convert to bytes
# Set optimization profiles
profile = builder.create_optimization_profile()
profile.set_shape("input_ids", (1, 1), (1, 128), (1, 512))
profile.set_shape("attention_mask", (1, 1), (1, 128), (1, 512))
profile.set_shape("token_type_ids", (1, 1), (1, 128), (1, 512))
config.add_optimization_profile(profile)
# Build the engine
engine = builder.build_engine(network, config)
if engine is None:
print("Failed to build the engine!")
return None
# Serialize and save the engine
with open(trt_model_path, 'wb') as engine_file:
engine_file.write(engine.serialize())
print(f"Successfully converted {onnx_model_path} to {trt_model_path}")
# Example usage
ONNX_MODEL_PATH = "model.onnx"
TRT_MODEL_PATH = "model.trt"
convert_onnx_to_trt(ONNX_MODEL_PATH, TRT_MODEL_PATH)
bash version:
#!/bin/bash
# readme about trtexec: https://github.com/NVIDIA/TensorRT/blob/master/samples/trtexec/README.md?plain=1
ONNX_MODEL_NAME=$1
TRT_MODEL_NAME=$2
WORKSPACE=140000
# convert onnx model to trt model
/usr/src/tensorrt/bin/trtexec \
--onnx=${ONNX_MODEL_NAME} \
--saveEngine=${TRT_MODEL_NAME} \
--minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 \
--optShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 \
--maxShapes=input_ids:1x512,attention_mask:1x512,token_type_ids:1x512 \
--workspace=${WORKSPACE} \
--verbose \
| tee conversion.txt
# run generated trt model
/usr/src/tensorrt/bin/trtexec --loadEngine=${TRT_MODEL_NAME} --verbose
tensorrt lib:
pip list | grep "tensorrt"
tensorrt 8.6.1
tensorrt-bindings 8.6.1
tensorrt-libs 8.6.1
more infos:
find / -name "tensorrt.so"
/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so
root@7ae30d5c9eea:/workspace# ldd -r /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so
linux-vdso.so.1 (0x00007ffff81fa000)
libnvinfer.so.8 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/libnvinfer.so.8 (0x00007f48da351000)
libnvonnxparser.so.8 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/libnvonnxparser.so.8 (0x00007f48d9e00000)
libnvparsers.so.8 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/libnvparsers.so.8 (0x00007f48d9800000)
libnvinfer_plugin.so.8 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/libnvinfer_plugin.so.8 (0x00007f48d7343000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f48d7117000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f48e90d1000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f48e90b1000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f48d6eef000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f48e90aa000)
libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f48e90a5000)
librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007f48e90a0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f48e91c8000)
libcublas.so.12 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/../nvidia/cublas/lib/libcublas.so.12 (0x00007f48d0600000)
libcublasLt.so.12 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/../nvidia/cublas/lib/libcublasLt.so.12 (0x00007f48ae600000)
libcudnn.so.8 => /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/../tensorrt_libs/../nvidia/cudnn/lib/libcudnn.so.8 (0x00007f48ae200000)
undefined symbol: PyInstanceMethod_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_ValueError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _Py_TrueStruct (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_IndexError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyModule_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySlice_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyMemoryView_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _Py_NoneStruct (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_MemoryError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyType_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyByteArray_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCFunction_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_OverflowError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyProperty_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_BufferError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_DeprecationWarning (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_RuntimeError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _Py_NotImplementedStruct (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyBaseObject_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_StopIteration (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_TypeError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyMethod_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _Py_FalseStruct (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyFloat_Type (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_SystemError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyExc_ImportError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_GenericGetDict (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_GenericSetDict (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyMemoryView_FromObject (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyTuple_SetItem (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_GetBuffer (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_Repr (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyLong_AsLong (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyLong_FromSsize_t (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyByteArray_Size (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_Call (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyIter_Check (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_And (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_NormalizeException (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyInstanceMethod_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyEval_AcquireThread (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_Str (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThreadState_DeleteCurrent (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyGILState_GetThisThreadState (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_GetAttrString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyMem_Free (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_Restore (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyType_IsSubtype (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyModule_AddObject (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_WarnEx (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_CheckBuffer (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_SetPointer (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyTuple_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_SetAttr (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_IsInstance (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyEval_RestoreThread (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyException_SetTraceback (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_Float (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyUnicode_FromFormat (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyList_Append (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySlice_AdjustIndices (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThreadState_GetFrame (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_Contains (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_Next (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyList_Size (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyTuple_Size (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyMemoryView_FromBuffer (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_Long (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyBuffer_Release (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_GetIter (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_Format (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_CallObject (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyFloat_FromDouble (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyFloat_AsDouble (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyUnicode_DecodeUTF8 (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _Py_Dealloc (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyByteArray_AsString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyList_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyImport_ImportModule (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_Check (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _PyObject_GetDictPtr (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyUnicode_FromString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyIndex_Check (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: Py_GetVersion (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_SetContext (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyFrame_GetLineNumber (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThread_tss_get (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyBytes_Size (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySequence_Check (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyList_GetItem (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyException_SetContext (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_Clear (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_HasAttrString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyWeakref_NewRef (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_SetString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_GetContext (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThreadState_Get (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_SetItem (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySlice_Unpack (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyMem_Calloc (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_SetAttrString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyGILState_Release (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_GetPointer (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_Xor (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThread_tss_alloc (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyEval_GetLocals (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyBytes_AsString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_LengthHint (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_GetItemWithError (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThread_tss_set (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_GetItem (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyType_Ready (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyEval_SaveThread (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySequence_GetItem (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_Invert (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_ClearWeakRefs (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySequence_Size (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyLong_FromLong (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyEval_GetBuiltins (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_WriteUnraisable (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_RichCompareBool (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyNumber_Or (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyModule_Create2 (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThread_tss_create (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyLong_AsUnsignedLong (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyLong_FromSize_t (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyFrame_GetBack (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_SetName (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyUnicode_AsEncodedString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_Occurred (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_Copy (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyErr_Fetch (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThreadState_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _PyThreadState_UncheckedGet (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: _PyType_Lookup (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_CallFunctionObjArgs (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_Size (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyIter_Next (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCallable_Check (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PySequence_Tuple (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyThreadState_Clear (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyDict_DelItemString (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyUnicode_AsUTF8AndSize (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyGILState_Ensure (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyObject_Malloc (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCMethod_New (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyCapsule_GetName (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyTuple_GetItem (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyFrame_GetCode (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyException_SetCause (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyUnicode_AsUTF8String (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
undefined symbol: PyBytes_AsStringAndSize (/usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so)
root@7ae30d5c9eea:/workspace#
root@7ae30d5c9eea:/workspace#
root@7ae30d5c9eea:/workspace# find / -name "trtexec"
/usr/src/tensorrt/bin/trtexec
root@7ae30d5c9eea:/workspace# ldd -r /usr/src/tensorrt/bin/trtexec
linux-vdso.so.1 (0x00007ffd1e97d000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1953bb4000)
libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1953baf000)
librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007f1953baa000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f195397e000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f1953895000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1953875000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f195364d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1953bc7000)
more infos:
dpkg -l | grep TensorRT
ii libnvinfer-bin 8.6.1.6-1+cuda12.0 amd64 TensorRT binaries
ii libnvinfer-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT development libraries
ii libnvinfer-dispatch-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT development dispatch runtime libraries
ii libnvinfer-dispatch8 8.6.1.6-1+cuda12.0 amd64 TensorRT dispatch runtime library
ii libnvinfer-headers-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT development headers
ii libnvinfer-headers-plugin-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT plugin headers
ii libnvinfer-lean-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT lean runtime libraries
ii libnvinfer-lean8 8.6.1.6-1+cuda12.0 amd64 TensorRT lean runtime library
ii libnvinfer-plugin-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT plugin libraries
ii libnvinfer-plugin8 8.6.1.6-1+cuda12.0 amd64 TensorRT plugin libraries
ii libnvinfer-vc-plugin-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT vc-plugin library
ii libnvinfer-vc-plugin8 8.6.1.6-1+cuda12.0 amd64 TensorRT vc-plugin library
ii libnvinfer8 8.6.1.6-1+cuda12.0 amd64 TensorRT runtime libraries
ii libnvonnxparsers-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT ONNX libraries
ii libnvonnxparsers8 8.6.1.6-1+cuda12.0 amd64 TensorRT ONNX libraries
ii libnvparsers-dev 8.6.1.6-1+cuda12.0 amd64 TensorRT parsers libraries
ii libnvparsers8 8.6.1.6-1+cuda12.0 amd64 TensorRT parsers libraries
ii tensorrt-dev 8.6.1.6-1+cuda12.0 amd64 Meta package for TensorRT development libraries
/usr/src/tensorrt/bin/trtexec --version
[06/21/2024-03:45:05] [E] Model missing or format not recognized
=== Model Options ===
--uff=<file> UFF model
--onnx=<file> ONNX model
--model=<file> Caffe model (default = no model, random weights used)
--deploy=<file> Caffe prototxt file
--output=<name>[,<name>]* Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
--uffInput=<name>,X,Y,Z Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
--uffNHWC Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)
=== Build Options ===
--maxBatch Set max batch size and build an implicit batch engine (default = same size as --batch)
This option should not be used when the input model is ONNX or when dynamic shapes are provided.
--minShapes=spec Build with dynamic shapes using a profile with the min shapes provided
--optShapes=spec Build with dynamic shapes using a profile with the opt shapes provided
--maxShapes=spec Build with dynamic shapes using a profile with the max shapes provided
--minShapesCalib=spec Calibrate with dynamic shapes using a profile with the min shapes provided
--optShapesCalib=spec Calibrate with dynamic shapes using a profile with the opt shapes provided
--maxShapesCalib=spec Calibrate with dynamic shapes using a profile with the max shapes provided
Note: All three of min, opt and max shapes must be supplied.
However, if only opt shapes is supplied then it will be expanded so
that min shapes and max shapes are set to the same values as opt shapes.
Input names can be wrapped with escaped single quotes (ex: 'Input:0').
Example input shapes spec: input0:1x3x256x256,input1:1x3x128x128
Each input shape is supplied as a key-value pair where key is the input name and
value is the dimensions (including the batch dimension) to be used for that input.
Each key-value pair has the key and value separated using a colon (:).
Multiple input shapes can be provided via comma-separated key-value pairs.
--inputIOFormats=spec Type and format of each of the input tensors (default = all inputs in fp32:chw)
See --outputIOFormats help for the grammar of type and format list.
Note: If this option is specified, please set comma-separated types and formats for all
inputs following the same order as network inputs ID (even if only one input
needs specifying IO format) or set the type and format once for broadcasting.
--outputIOFormats=spec Type and format of each of the output tensors (default = all outputs in fp32:chw)
Note: If this option is specified, please set comma-separated types and formats for all
outputs following the same order as network outputs ID (even if only one output
needs specifying IO format) or set the type and format once for broadcasting.
IO Formats: spec ::= IOfmt[","spec]
IOfmt ::= type:fmt
type ::= "fp32"|"fp16"|"int32"|"int8"
fmt ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8"|
"cdhw32"|"hwc"|"dla_linear"|"dla_hwc4")["+"fmt]
--workspace=N Set workspace size in MiB.
--memPoolSize=poolspec Specify the size constraints of the designated memory pool(s) in MiB.
Note: Also accepts decimal sizes, e.g. 0.25MiB. Will be rounded down to the nearest integer bytes.
In particular, for dlaSRAM the bytes will be rounded down to the nearest power of 2.
Pool constraint: poolspec ::= poolfmt[","poolspec]
poolfmt ::= pool:sizeInMiB
pool ::= "workspace"|"dlaSRAM"|"dlaLocalDRAM"|"dlaGlobalDRAM"
--profilingVerbosity=mode Specify profiling verbosity. mode ::= layer_names_only|detailed|none (default = layer_names_only)
--minTiming=M Set the minimum number of iterations used in kernel selection (default = 1)
--avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = 8)
--refit Mark the engine as refittable. This will allow the inspection of refittable layers
and weights within the engine.
--versionCompatible, --vc Mark the engine as version compatible. This allows the engine to be used with newer versions
of TensorRT on the same host OS, as well as TensorRT's dispatch and lean runtimes.
Only supported with explicit batch.
--useRuntime=runtime TensorRT runtime to execute engine. "lean" and "dispatch" require loading VC engine and do
not support building an engine.
runtime::= "full"|"lean"|"dispatch"
--leanDLLPath=<file> External lean runtime DLL to use in version compatiable mode.
--excludeLeanRuntime When --versionCompatible is enabled, this flag indicates that the generated engine should
not include an embedded lean runtime. If this is set, the user must explicitly specify a
valid lean runtime to use when loading the engine. Only supported with explicit batch
and weights within the engine.
--sparsity=spec Control sparsity (default = disabled).
Sparsity: spec ::= "disable", "enable", "force"
Note: Description about each of these options is as below
disable = do not enable sparse tactics in the builder (this is the default)
enable = enable sparse tactics in the builder (but these tactics will only be
considered if the weights have the right sparsity pattern)
force = enable sparse tactics in the builder and force-overwrite the weights to have
a sparsity pattern (even if you loaded a model yourself)
--noTF32 Disable tf32 precision (default is to enable tf32, in addition to fp32)
--fp16 Enable fp16 precision, in addition to fp32 (default = disabled)
--int8 Enable int8 precision, in addition to fp32 (default = disabled)
--fp8 Enable fp8 precision, in addition to fp32 (default = disabled)
--best Enable all precisions to achieve the best performance (default = disabled)
--directIO Avoid reformatting at network boundaries. (default = disabled)
--precisionConstraints=spec Control precision constraint setting. (default = none)
Precision Constraints: spec ::= "none" | "obey" | "prefer"
none = no constraints
prefer = meet precision constraints set by --layerPrecisions/--layerOutputTypes if possible
obey = meet precision constraints set by --layerPrecisions/--layerOutputTypes or fail
otherwise
--layerPrecisions=spec Control per-layer precision constraints. Effective only when precisionConstraints is set to
"obey" or "prefer". (default = none)
The specs are read left-to-right, and later ones override earlier ones. "*" can be used as a
layerName to specify the default precision for all the unspecified layers.
Per-layer precision spec ::= layerPrecision[","spec]
layerPrecision ::= layerName":"precision
precision ::= "fp32"|"fp16"|"int32"|"int8"
--layerOutputTypes=spec Control per-layer output type constraints. Effective only when precisionConstraints is set to
"obey" or "prefer". (default = none
The specs are read left-to-right, and later ones override earlier ones. "*" can be used as a
layerName to specify the default precision for all the unspecified layers. If a layer has more than
one output, then multiple types separated by "+" can be provided for this layer.
Per-layer output type spec ::= layerOutputTypes[","spec]
layerOutputTypes ::= layerName":"type
type ::= "fp32"|"fp16"|"int32"|"int8"["+"type]
--layerDeviceTypes=spec Specify layer-specific device type.
The specs are read left-to-right, and later ones override earlier ones. If a layer does not have
a device type specified, the layer will opt for the default device type.
Per-layer device type spec ::= layerDeviceTypePair[","spec]
layerDeviceTypePair ::= layerName":"deviceType
deviceType ::= "GPU"|"DLA"
--calib=<file> Read INT8 calibration cache file
--safe Enable build safety certified engine, if DLA is enable, --buildDLAStandalone will be specified
automatically (default = disabled)
--buildDLAStandalone Enable build DLA standalone loadable which can be loaded by cuDLA, when this option is enabled,
--allowGPUFallback is disallowed and --skipInference is enabled by default. Additionally,
specifying --inputIOFormats and --outputIOFormats restricts I/O data type and memory layout
(default = disabled)
--allowGPUFallback When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
--consistency Perform consistency checking on safety certified engine
--restricted Enable safety scope checking with kSAFETY_SCOPE build flag
--saveEngine=<file> Save the serialized engine
--loadEngine=<file> Load a serialized engine
--tacticSources=tactics Specify the tactics to be used by adding (+) or removing (-) tactics from the default
tactic sources (default = all available tactics).
Note: Currently only cuDNN, cuBLAS, cuBLAS-LT, and edge mask convolutions are listed as optional
tactics.
Tactic Sources: tactics ::= [","tactic]
tactic ::= (+|-)lib
lib ::= "CUBLAS"|"CUBLAS_LT"|"CUDNN"|"EDGE_MASK_CONVOLUTIONS"
|"JIT_CONVOLUTIONS"
For example, to disable cudnn and enable cublas: --tacticSources=-CUDNN,+CUBLAS
--noBuilderCache Disable timing cache in builder (default is to enable timing cache)
--heuristic Enable tactic selection heuristic in builder (default is to disable the heuristic)
--timingCacheFile=<file> Save/load the serialized global timing cache
--preview=features Specify preview feature to be used by adding (+) or removing (-) preview features from the default
Preview Features: features ::= [","feature]
feature ::= (+|-)flag
flag ::= "fasterDynamicShapes0805"
|"disableExternalTacticSourcesForCore0805"
|"profileSharing0806"
--builderOptimizationLevel Set the builder optimization level. (default is 3)
Higher level allows TensorRT to spend more building time for more optimization options.
Valid values include integers from 0 to the maximum optimization level, which is currently 5.
--hardwareCompatibilityLevel=mode Make the engine file compatible with other GPU architectures. (default = none)
Hardware Compatibility Level: mode ::= "none" | "ampere+"
none = no compatibility
ampere+ = compatible with Ampere and newer GPUs
--tempdir=<dir> Overrides the default temporary directory TensorRT will use when creating temporary files.
See IRuntime::setTemporaryDirectory API documentation for more information.
--tempfileControls=controls Controls what TensorRT is allowed to use when creating temporary executable files.
Should be a comma-separated list with entries in the format (in_memory|temporary):(allow|deny).
in_memory: Controls whether TensorRT is allowed to create temporary in-memory executable files.
temporary: Controls whether TensorRT is allowed to create temporary executable files in the
filesystem (in the directory given by --tempdir).
For example, to allow in-memory files and disallow temporary files:
--tempfileControls=in_memory:allow,temporary:deny
If a flag is unspecified, the default behavior is "allow".
--maxAuxStreams=N Set maximum number of auxiliary streams per inference stream that TRT is allowed to use to run
kernels in parallel if the network contains ops that can run in parallel, with the cost of more
memory usage. Set this to 0 for optimal memory usage. (default = using heuristics)
=== Inference Options ===
--batch=N Set batch size for implicit batch engines (default = 1)
This option should not be used when the engine is built from an ONNX model or when dynamic
shapes are provided when the engine is built.
--shapes=spec Set input shapes for dynamic shapes inference inputs.
Note: Input names can be wrapped with escaped single quotes (ex: 'Input:0').
Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128
Each input shape is supplied as a key-value pair where key is the input name and
value is the dimensions (including the batch dimension) to be used for that input.
Each key-value pair has the key and value separated using a colon (:).
Multiple input shapes can be provided via comma-separated key-value pairs.
--loadInputs=spec Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
Input values spec ::= Ival[","spec]
Ival ::= name":"file
--iterations=N Run at least N inference iterations (default = 10)
--warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200)
--duration=N Run performance measurements for at least N seconds wallclock time (default = 3)
If -1 is specified, inference will keep running unless stopped manually
--sleepTime=N Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
--idleTime=N Sleep N milliseconds between two continuous iterations(default = 0)
--infStreams=N Instantiate N engines to run inference concurrently (default = 1)
--exposeDMA Serialize DMA transfers to and from device (default = disabled).
--noDataTransfers Disable DMA transfers to and from device (default = enabled).
--useManagedMemory Use managed memory instead of separate host and device allocations (default = disabled).
--useSpinWait Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
--threads Enable multithreading to drive engines with independent threads or speed up refitting (default = disabled)
--useCudaGraph Use CUDA graph to capture engine execution and then launch inference (default = disabled).
This flag may be ignored if the graph capture fails.
--timeDeserialize Time the amount of time it takes to deserialize the network and exit.
--timeRefit Time the amount of time it takes to refit the engine before inference.
--separateProfileRun Do not attach the profiler in the benchmark run; if profiling is enabled, a second profile run will be executed (default = disabled)
--skipInference Exit after the engine has been built and skip inference perf measurement (default = disabled)
--persistentCacheRatio Set the persistentCacheLimit in ratio, 0.5 represent half of max persistent L2 size (default = 0)
=== Build and Inference Batch Options ===
When using implicit batch, the max batch size of the engine, if not given,
is set to the inference batch size;
when using explicit batch, if shapes are specified only for inference, they
will be used also as min/opt/max in the build profile; if shapes are
specified only for the build, the opt shapes will be used also for inference;
if both are specified, they must be compatible; and if explicit batch is
enabled but neither is specified, the model must provide complete static
dimensions, including batch size, for all inputs
Using ONNX models automatically forces explicit batch.
=== Reporting Options ===
--verbose Use verbose logging (default = false)
--avgRuns=N Report performance measurements averaged over N consecutive iterations (default = 10)
--percentile=P1,P2,P3,... Report performance for the P1,P2,P3,... percentages (0<=P_i<=100, 0 representing max perf, and 100 representing min perf; (default = 90,95,99%)
--dumpRefit Print the refittable layers and weights from a refittable engine
--dumpOutput Print the output tensor(s) of the last inference iteration (default = disabled)
--dumpRawBindingsToFile Print the input/output tensor(s) of the last inference iteration to file(default = disabled)
--dumpProfile Print profile information per layer (default = disabled)
--dumpLayerInfo Print layer information of the engine to console (default = disabled)
--exportTimes=<file> Write the timing results in a json file (default = disabled)
--exportOutput=<file> Write the output tensors to a json file (default = disabled)
--exportProfile=<file> Write the profile information per layer in a json file (default = disabled)
--exportLayerInfo=<file> Write the layer information of the engine in a json file (default = disabled)
=== System Options ===
--device=N Select cuda device N (default = 0)
--useDLACore=N Select DLA core N for layers that support DLA (default = none)
--staticPlugins Plugin library (.so) to load statically (can be specified multiple times)
--dynamicPlugins Plugin library (.so) to load dynamically and may be serialized with the engine if they are included in --setPluginsToSerialize (can be specified multiple times)
--setPluginsToSerialize Plugin library (.so) to be serialized with the engine (can be specified multiple times)
--ignoreParsedPluginLibs By default, when building a version-compatible engine, plugin libraries specified by the ONNX parser
are implicitly serialized with the engine (unless --excludeLeanRuntime is specified) and loaded dynamically.
Enable this flag to ignore these plugin libraries instead.
=== Help ===
--help, -h Print this message
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # /usr/src/tensorrt/bin/trtexec --version
Let's focus on issues, what is your problem about trtexec convert onnx ? If you use trtexec, add
para --verbose
and then make output redirect to txt on disk, and upload file here.
@yjiangling Is there any code that will generate the calibration.cache file?
Description
hi, i have an onnx model which i want to convert using trtexec:
it seems the onnx model is currently not supported with those datatypes - how to convert (what tool + settings) the model so tensorRT can use it?
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: Nvidia T4
NVIDIA Driver Version:
CUDA Version: 12.x
CUDNN Version:
Operating System: Ubuntu 20.04
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
pip list:
Commands or scripts: trtexec output:
trtexec output with verbose:
Have you tried the latest release?:
why polygraphy wants to load a cuda 11.x library? i run this inside nvcr.io/nvidia/pytorch:24.03-py3 docker: Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):cc @pranavm-nvidia @sachanub