dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.73k stars 2.97k forks source link

Using custom trained model with ROS Deep Learning #1757

Open jrvis1726 opened 10 months ago

jrvis1726 commented 10 months ago

Hi @dusty-nv ,

Thank you for this amazing project. I am trying to use a re-trained YOLOV8 model (re-trained using Ultralytics Library on VisDrone Dataset) in onnx format with your ROS Deep Learning framework. I have put my model in this path: "../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx" and trying to run inference using the following command: "roslaunch ros_deep_learning detectnet.ros1.launch model_path:="../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx" input:=file://home/jrvis/Downloads/IMG_9316.mp4 output:=file://home/jrvis/Downloads/output1.mp4"

However, I am getting this error: [TRT] loading network plan from engine cache... ../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx.1.1.8201.GPU.FP16.engine [TRT] device GPU, loaded ../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 274, GPU 3452 (MiB) [TRT] Loaded engine size: 23 MiB [TRT] Using cublas as a tactic source [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +89, now: CPU 438, GPU 3448 (MiB) [TRT] Using cuDNN as a tactic source [TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU -5, now: CPU 678, GPU 3443 (MiB) [TRT] Deserialization required 5478886 microseconds. [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +22, now: CPU 0, GPU 22 (MiB) [TRT] Using cublas as a tactic source [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 678, GPU 3447 (MiB) [TRT] Using cuDNN as a tactic source [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 678, GPU 3447 (MiB) [TRT] Total per-runner device persistent memory is 22895104 [TRT] Total per-runner host persistent memory is 117824 [TRT] Allocated activation device memory of size 52026880 [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +71, now: CPU 0, GPU 93 (MiB) [TRT]
[TRT] CUDA engine context initialized on device GPU: [TRT] -- layers 184 [TRT] -- maxBatchSize 1 [TRT] -- deviceMemory 52026880 [TRT] -- bindings 2 [TRT] binding 0 -- index 0 -- name 'images' -- type FP32 -- in/out INPUT -- # dims 4 -- dim #0 1 -- dim #1 3 -- dim #2 1504 -- dim #3 1504 [TRT] binding 1 -- index 1 -- name 'output0' -- type FP32 -- in/out OUTPUT -- # dims 3 -- dim #0 1 -- dim #1 14 -- dim #2 46389 [TRT]
[TRT] 3: Cannot find binding of given name: [TRT] failed to find requested input layer in network [TRT] device GPU, failed to create resources for CUDA engine [TRT] failed to create TensorRT engine for ../jetson-inference/python/training/detection/ssd/models/YOLOV8/best.onnx, device GPU [TRT] detectNet -- failed to initialize. [ERROR] [1699466176.794739503]: failed to load detectNet model

Is what I am trying to achieve possible in this project? If yes, what am I doing wrong?

tiwaojo commented 9 months ago

Hello @dusty-nv,

I appreciate the hard work you are doing on this project. I am experiencing a similar issue as @jrvis1726 above. I am using the dustynv/ros:humble-pytorch-l4t-r35.3.1 image on a Jetson Xaviar NX and an exported model from Ultralytics . Moreover, I also have the following error message in my stdout.

[detectnet-2] [TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1
dusty-nv commented 9 months ago
[detectnet-2] [TRT]    Could not register plugin creator -  ::FlattenConcat_TRT version 1

Hi @tiwaojo, this is just a warning you can ignore, but to run YOLO in jetson-inference you would need to adapt the pre/post-processing in jetson-inference/c/detectNet.cpp for what the particular version of YOLO you are using expects. Or there are other projects that accelerate YOLO with TensorRT that you could use.

tiwaojo commented 9 months ago

Hey @dusty-nv, thanks for your reply. Perhaps, I didn't provide enough context into my issue. I am using yolov8 and was able to successfully launch the detectnet executable with my yolov8 model. However, upon launching the detectnet ROS2 node with the following:

ros2 launch ros_deep_learning detectnet.ros2.launch input:=csi://0 output:=display://0 model_path:=/jetson-inference/ros/yolov8n.onnx

I get the error previously mentioned above. Furthermore, I browsed through jetson-inference/c/detectNet.cpp/preprocess() and jetson-inference/c/detectNet.cpp/postprocess() in hopes of applying your recommendations, but was unable to find adequate resources to assist me.

Or there are other projects that accelerate YOLO with TensorRT that you could use.

I had tried using Isaac ROS and its isaac_ros_yolov8 package, but its performance was less than satisfactory.

dusty-nv commented 9 months ago

Perhaps, I didn't provide enough context into my issue. I am using yolov8 and was able to successfully launch the detectnet executable with my yolov8 model.

Are you sure? I'm surprised that it would have worked. If it did, then you probably need to also set these params to the detectnet ROS2 node:

input_blob
output_cvg
output_bbox

https://github.com/dusty-nv/ros_deep_learning?tab=readme-ov-file#detectnet-node-1

I had tried using Isaac ROS and its isaac_ros_yolov8 package, but its performance was less than satisfactory.

isaac_ros_yolov8 uses TensorRT, so it should already provide faster performance than you would get with just the original ultralytics repo, and probably you would not get faster from jetson-inference because TensorRT is doing the inferencing which accounts for most (if not all) of the runtime.

tiwaojo commented 9 months ago

Thanks for the suggestion. I set the params as suggested which did not work till I imported my model into the netron web app. It provided me with the appropriate values to my launch file. Although there were no bounding boxes in my display, the topic had activity.

ros2 launch ros_deep_learning detectnet.ros2.launch input:=csi://0 output:=display://0 model_path:=/jetson-inference/ros/yolov8n.onnx output_bbox:=output0 output_cvg:=output0 input_blob:=images

Although there were no bounding boxes in my display, the topic had activity.

isaac_ros_yolov8 uses TensorRT, so it should already provide faster performance than you would get with just the original ultralytics repo, and probably you would not get faster from jetson-inference because TensorRT is doing the inferencing which accounts for most (if not all) of the runtime.

Perhaps my issue if a hardware bottleneck,. I found myself using ~6.3 GiB of 6.7GiB at ~90% CPU utilization with the isaac_ros_yolov8 package, and ~5GiB with 60% CPU utilization for the ros_deep_learning detectnet package on a single csi camera.

IamShubhamGupto commented 7 months ago

Hello!

I too am interested in using the ros_deep_learning package for running inference on a custom trained yolov8 / yolo-nas. would it be possible to add official guides on how to do this?

Thank you

dusty-nv commented 7 months ago

Hi @IamShubhamGupto, jetson-inference / ros_deep_learning doesn't have built-in support for all the YOLO variants - instead please refer to the isaac_ros_yolov8 package

IamShubhamGupto commented 7 months ago

Hi @IamShubhamGupto, jetson-inference / ros_deep_learning doesn't have built-in support for all the YOLO variants - instead please refer to the isaac_ros_yolov8 package

its amazing work, thank you