Open YumainOB opened 1 year ago
I alse met the same question:
nvidia@ubuntu:~/Desktop/HXB/11-4/YOLOv8-TensorRT-CPP/build$ ./detect_object_image --model /home/nvidia/Desktop/HXB/11-4/yolov8n-seg_sim.onnx --input ./bus2.jpg
Searching for engine file with name: yolov8n-seg_sim.engine.NVIDIATegraX2.fp16.1.1
Engine not found, generating. This could take a while...
onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Model only supports fixed batch size of 1
10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_177.)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)
Do you solved it? @YumainOB @cyrusbehr
Sorry I still have no clue on this issue.
@cyrusbehr do you have some idea?
Sorry I still have no clue on this issue.
@cyrusbehr do you have some idea? I think convtranspose is supported by TensorRT 8.4, but current Jetpack Tensorrt version is 8.2, how can I upgrade tensorrt to 8.4 without upgrading Jetpack? @YumainOB
As far as I know, this is not possible to update TensorRT without upgrading the jetpack.
On the other side using Ultralytics repo and precisely "yolo export ..." using this jetpack/TensorRT without any issue, so I doubt that updating them is the only way to have the issue solved.
Best regards
I found a way to get the engine completely generated. Thanks to this post: https://forums.developer.nvidia.com/t/convtranspose-onnx-to-tensorrt-conversion-fail/181720/2. To apply this idea I added the following line in engine.cpp: config->setMaxWorkspaceSize(30); rigth after the IBuilderConfig creation and cheking.
That's a nice point
But I'm facing another issue later with a runtime failure... Here are the logs: CUDA_LAUNCH_BLOCKING=1 ./detect_object_image --model yolov8n_seg.onnx --input image.jpg Searching for engine file with name: yolov8n_seg.engine.NVIDIATegraX2.fp16.1.1 Engine found, not regenerating... [MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 301, GPU 7174 (MiB) Loaded engine size: 13 MiB Using cublas as a tactic source [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +169, now: CPU 475, GPU 7350 (MiB) Using cuDNN as a tactic source [MemUsageChange] Init cuDNN: CPU +250, GPU +252, now: CPU 725, GPU 7602 (MiB) Deserialization required 2294905 microseconds. [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB) Using cublas as a tactic source [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 725, GPU 7602 (MiB) Using cuDNN as a tactic source [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 725, GPU 7602 (MiB) Total per-runner device persistent memory is 12509184 Total per-runner host persistent memory is 137424 Allocated activation device memory of size 14695424 [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +26, now: CPU 0, GPU 38 (MiB) 1: [reformat.cu::NCHHW2ToNCHW::1049] Error Code 1: Cuda Runtime (unspecified launch failure) terminate called after throwing an instance of 'std::runtime_error' what(): Error: Unable to run inference. Aborted (core dumped)
A similar message happens whith precision set to FP32: CUDA_LAUNCH_BLOCKING=1 ./detect_object_image --model ~/workspace/ppanto_yolo/yolov8n_seg.onnx --input image.jpg --precision FP32 Searching for engine file with name: yolov8n_seg.engine.NVIDIATegraX2.fp32.1.1 Engine found, not regenerating... [MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 315, GPU 6966 (MiB) Loaded engine size: 27 MiB Using cublas as a tactic source [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +170, now: CPU 489, GPU 7143 (MiB) Using cuDNN as a tactic source [MemUsageChange] Init cuDNN: CPU +250, GPU +251, now: CPU 739, GPU 7394 (MiB) Deserialization required 2305892 microseconds. [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +26, now: CPU 0, GPU 26 (MiB) Using cublas as a tactic source [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 739, GPU 7394 (MiB) Using cuDNN as a tactic source [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 739, GPU 7394 (MiB) Total per-runner device persistent memory is 27359232 Total per-runner host persistent memory is 129312 Allocated activation device memory of size 22171136 [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +47, now: CPU 0, GPU 73 (MiB) 1: [pointWiseV2Helpers.h::launchPwgenKernel::546] Error Code 1: Cuda Driver (unspecified launch failure) terminate called after throwing an instance of 'std::runtime_error' what(): Error: Unable to run inference. Aborted (core dumped)
@cyrusbehr Do you have any clue?
Hello and thank you for your good job in bringing Yolov8 to the TensorRT C++ side.
I would like to help if it is possible but for now I'm facing an issue with the engine creation in case of a segmentation model. It seems that there is a missing stuff for "ConvTranspose_178 (CaskDeconvolution)" if I don't missunderstand logs.
I run the code on a TX2 board (with branch feat/jetson-tx2 obviously) Here is the jetson environment: $ jetson_release Software part of jetson-stats 4.2.3 - (c) 2023, Raffaello Bonghi Model: quill - Jetpack 4.6.4 [L4T 32.7.4] NV Power Mode[0]: MAXN Serial Number: [XXX Show with: jetson_release -s XXX] Hardware:
Here is the command I use:
./benchmark --model yolov8n_seg.onnx --input ~/workspace/ppanto_yolo/test_ressources --precision FP16 --class-names class1 class2
Here are the relevant pat of the logs.
Do you have an idea of what I can do to get the model working right? What I don't understand is that I can export to engine using Ultralytics export and trtexec. Do you have a clue?
Best regards