How to run with TensorRT?

bigpigdog commented 3 months ago

Hi,

I have a laptop with a mobile NVIDIA GeForce GTX 1050Ti GPU that consumes most of its performance when running at 20FPS. Can anyone give some help or provide a better way to run with TensorRT. Since I am newbie to TensorRT.

Thanks.

Pandoro commented 3 months ago

As far as I recall, we never looked into this. Potentially @kumuji has worked on this at some point in the past. Generally, we don't have readily available code for this. I think you'll really have to go through the pytorch + tensorrt tutorials to make this work. I'll lave this open for a while to see if someone else picks up on it, but gien this is not a crazy active repo, I don't expect that to happen.

kumuji commented 3 months ago

@bigpigdog @Pandoro I haven't worked directly with this codebase and can only give some general pointers on how to make this code run with tensorrt optimizations. A long time ago I made this package that runs yolo under tensorrt https://github.com/kumuji/trt_yolo_ros, it is not the latest work, but you may still find it useful. In general, you need to convert your model and code to run under tensorrt. I found this quite difficult at the time, as there is only a limited set of operations available in tensorrt, and I ended up having to convert an onnx model to tensorrt. It took a few days to figure out for sure.

If you want to quickly improve performance on your device, I would consider updating the pytorch version as a first step, introducing @torch.compile wrappers, and taking a close look at the ros-wrapping code. In my project, I noticed that most of the resources were not consumed by the model, but by the pre-/post-processing code. For this, I can also recommend something like a numba package, or you can go hardcore and rewrite everything in cpp. Just wanted to make you aware that tensorrt is not the only option to make it run faster =)

bigpigdog commented 3 months ago

@Pandoro @kumuji Thank you for your prompt response. Yes, I've also noticed that most of the resources are consumed in the pre-/post-processing code (data exchange between CPU and GPU). It seems that rewriting the code in C++ could potentially enhance the efficiency of the pre-/post-processing stages. I'll take this into consideration and may explore this approach when possible. Your advice is greatly appreciated.

VisualComputingInstitute / 2D_lidar_person_detection

How to run with TensorRT? #20