dusty-nv / ros_deep_learning

Deep learning inference nodes for ROS / ROS2 with support for NVIDIA Jetson and TensorRT
862 stars 258 forks source link

Using custom trained model with ROS Deep Learning #130

Open jrvis1726 opened 8 months ago

jrvis1726 commented 8 months ago

Hi @dusty-nv ,

Thank you for this amazing project. I am trying to use a re-trained YOLOV8 model (re-trained using Ultralytics Library on VisDrone Dataset) in onnx format with your ROS Deep Learning framework. I have put my model in this path: "../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx" and trying to run inference using the following command: "roslaunch ros_deep_learning detectnet.ros1.launch model_path:="../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx" input:=file://home/jrvis/Downloads/IMG_9316.mp4 output:=file://home/jrvis/Downloads/output1.mp4"

However, I am getting this error: [TRT] loading network plan from engine cache... ../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx.1.1.8201.GPU.FP16.engine [TRT] device GPU, loaded ../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 274, GPU 3452 (MiB) [TRT] Loaded engine size: 23 MiB [TRT] Using cublas as a tactic source [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +89, now: CPU 438, GPU 3448 (MiB) [TRT] Using cuDNN as a tactic source [TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU -5, now: CPU 678, GPU 3443 (MiB) [TRT] Deserialization required 5478886 microseconds. [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +22, now: CPU 0, GPU 22 (MiB) [TRT] Using cublas as a tactic source [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 678, GPU 3447 (MiB) [TRT] Using cuDNN as a tactic source [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 678, GPU 3447 (MiB) [TRT] Total per-runner device persistent memory is 22895104 [TRT] Total per-runner host persistent memory is 117824 [TRT] Allocated activation device memory of size 52026880 [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +71, now: CPU 0, GPU 93 (MiB) [TRT]
[TRT] CUDA engine context initialized on device GPU: [TRT] -- layers 184 [TRT] -- maxBatchSize 1 [TRT] -- deviceMemory 52026880 [TRT] -- bindings 2 [TRT] binding 0 -- index 0 -- name 'images' -- type FP32 -- in/out INPUT -- # dims 4 -- dim #0 1 -- dim #1 3 -- dim #2 1504 -- dim #3 1504 [TRT] binding 1 -- index 1 -- name 'output0' -- type FP32 -- in/out OUTPUT -- # dims 3 -- dim #0 1 -- dim #1 14 -- dim #2 46389 [TRT]
[TRT] 3: Cannot find binding of given name: [TRT] failed to find requested input layer in network [TRT] device GPU, failed to create resources for CUDA engine [TRT] failed to create TensorRT engine for ../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx, device GPU [TRT] detectNet -- failed to initialize. [ERROR] [1699466176.794739503]: failed to load detectNet model

Is what I am trying to achieve possible in this project? If yes, what am I doing wrong?